Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@Via Git¶
-
+
diff --git a/latest/index.html b/latest/index.html
index 76509686f5..3a06afc6d9 100644
--- a/latest/index.html
+++ b/latest/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/latest/modules/contrib.html b/latest/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/latest/modules/contrib.html
+++ b/latest/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/latest/modules/datasets.html b/latest/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/latest/modules/datasets.html
+++ b/latest/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/latest/modules/io.html b/latest/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/latest/modules/io.html
+++ b/latest/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/latest/modules/models.html b/latest/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/latest/modules/models.html
+++ b/latest/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/latest/modules/transforms.html b/latest/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/latest/modules/transforms.html
+++ b/latest/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/latest/modules/utils.html b/latest/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/latest/modules/utils.html
+++ b/latest/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/latest/notebooks.html b/latest/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/latest/notebooks.html
+++ b/latest/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/latest/search.html b/latest/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/latest/search.html
+++ b/latest/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/latest/searchindex.js b/latest/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/latest/searchindex.js
+++ b/latest/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/latest/using_doctr/custom_models_training.html b/latest/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/latest/using_doctr/custom_models_training.html
+++ b/latest/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/latest/using_doctr/running_on_aws.html b/latest/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/latest/using_doctr/running_on_aws.html
+++ b/latest/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/latest/using_doctr/sharing_models.html b/latest/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/latest/using_doctr/sharing_models.html
+++ b/latest/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/latest/using_doctr/using_contrib_modules.html b/latest/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/latest/using_doctr/using_contrib_modules.html
+++ b/latest/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/latest/using_doctr/using_datasets.html b/latest/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/latest/using_doctr/using_datasets.html
+++ b/latest/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/latest/using_doctr/using_model_export.html b/latest/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/latest/using_doctr/using_model_export.html
+++ b/latest/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/latest/using_doctr/using_models.html b/latest/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/latest/using_doctr/using_models.html
+++ b/latest/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/modules/contrib.html b/modules/contrib.html
index 22b0c508a6..b8878635b6 100644
--- a/modules/contrib.html
+++ b/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -376,7 +376,7 @@ Supported contribution modules
-
+
diff --git a/modules/datasets.html b/modules/datasets.html
index 0fe4b78d48..dfcacbc96e 100644
--- a/modules/datasets.html
+++ b/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1077,7 +1077,7 @@ Returns:
-
+
diff --git a/modules/io.html b/modules/io.html
index 924d292c59..77e9e017bf 100644
--- a/modules/io.html
+++ b/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -756,7 +756,7 @@ Returns:¶
-
+
diff --git a/modules/models.html b/modules/models.html
index bf45d11a71..f4a9833365 100644
--- a/modules/models.html
+++ b/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1598,7 +1598,7 @@ Args:¶
-
+
diff --git a/modules/transforms.html b/modules/transforms.html
index 6d77d16e7b..bc254c867b 100644
--- a/modules/transforms.html
+++ b/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -831,7 +831,7 @@ Args:¶<
-
+
diff --git a/modules/utils.html b/modules/utils.html
index 3dd3ecbd96..6784d81f6f 100644
--- a/modules/utils.html
+++ b/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -711,7 +711,7 @@ Args:¶
-
+
diff --git a/notebooks.html b/notebooks.html
index f3ea994e49..647f73d4eb 100644
--- a/notebooks.html
+++ b/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -387,7 +387,7 @@ docTR Notebooks
-
+
diff --git a/search.html b/search.html
index f0693e2c97..0e0da5efb3 100644
--- a/search.html
+++ b/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -336,7 +336,7 @@
-
+
diff --git a/searchindex.js b/searchindex.js
index 8598997441..df18967072 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[1, "correction"]], "2. Warning": [[1, "warning"]], "3. Temporary Ban": [[1, "temporary-ban"]], "4. Permanent Ban": [[1, "permanent-ban"]], "AWS Lambda": [[13, null]], "Advanced options": [[18, "advanced-options"]], "Args:": [[6, "args"], [6, "id4"], [6, "id7"], [6, "id10"], [6, "id13"], [6, "id16"], [6, "id19"], [6, "id22"], [6, "id25"], [6, "id29"], [6, "id32"], [6, "id37"], [6, "id40"], [6, "id46"], [6, "id49"], [6, "id50"], [6, "id51"], [6, "id54"], [6, "id57"], [6, "id60"], [6, "id61"], [7, "args"], [7, "id2"], [7, "id3"], [7, "id4"], [7, "id5"], [7, "id6"], [7, "id7"], [7, "id10"], [7, "id12"], [7, "id14"], [7, "id16"], [7, "id20"], [7, "id24"], [7, "id28"], [8, "args"], [8, "id3"], [8, "id8"], [8, "id13"], [8, "id17"], [8, "id21"], [8, "id26"], [8, "id31"], [8, "id36"], [8, "id41"], [8, "id46"], [8, "id50"], [8, "id54"], [8, "id59"], [8, "id63"], [8, "id68"], [8, "id73"], [8, "id77"], [8, "id81"], [8, "id85"], [8, "id90"], [8, "id95"], [8, "id99"], [8, "id104"], [8, "id109"], [8, "id114"], [8, "id119"], [8, "id123"], [8, "id127"], [8, "id132"], [8, "id137"], [8, "id142"], [8, "id146"], [8, "id150"], [8, "id155"], [8, "id159"], [8, "id163"], [8, "id167"], [8, "id169"], [8, "id171"], [8, "id173"], [9, "args"], [9, "id1"], [9, "id2"], [9, "id3"], [9, "id4"], [9, "id5"], [9, "id6"], [9, "id7"], [9, "id8"], [9, "id9"], [9, "id10"], [9, "id11"], [9, "id12"], [9, "id13"], [9, "id14"], [9, "id15"], [9, "id16"], [9, "id17"], [9, "id18"], [9, "id19"], [10, "args"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"]], "Artefact": [[7, "artefact"]], "ArtefactDetection": [[15, "artefactdetection"]], "Attribution": [[1, "attribution"]], "Available Datasets": [[16, "available-datasets"]], "Available architectures": [[18, "available-architectures"], [18, "id1"], [18, "id2"]], "Available contribution modules": [[15, "available-contribution-modules"]], "Block": [[7, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[16, null]], "Choosing the right model": [[18, null]], "Classification": [[14, "classification"]], "Code quality": [[2, "code-quality"]], "Code style verification": [[2, "code-style-verification"]], "Codebase structure": [[2, "codebase-structure"]], "Commits": [[2, "commits"]], "Composing transformations": [[9, "composing-transformations"]], "Continuous Integration": [[2, "continuous-integration"]], "Contributing to docTR": [[2, null]], "Contributor Covenant Code of Conduct": [[1, null]], "Custom dataset loader": [[6, "custom-dataset-loader"]], "Custom orientation classification models": [[12, "custom-orientation-classification-models"]], "Data Loading": [[16, "data-loading"]], "Dataloader": [[6, "dataloader"]], "Detection": [[14, "detection"], [16, "detection"]], "Detection predictors": [[18, "detection-predictors"]], "Developer mode installation": [[2, "developer-mode-installation"]], "Developing docTR": [[2, "developing-doctr"]], "Document": [[7, "document"]], "Document structure": [[7, "document-structure"]], "End-to-End OCR": [[18, "end-to-end-ocr"]], "Enforcement": [[1, "enforcement"]], "Enforcement Guidelines": [[1, "enforcement-guidelines"]], "Enforcement Responsibilities": [[1, "enforcement-responsibilities"]], "Export to ONNX": [[17, "export-to-onnx"]], "Feature requests & bug report": [[2, "feature-requests-bug-report"]], "Feedback": [[2, "feedback"]], "File reading": [[7, "file-reading"]], "Half-precision": [[17, "half-precision"]], "Installation": [[3, null]], "Integrate contributions into your pipeline": [[15, null]], "Let\u2019s connect": [[2, "let-s-connect"]], "Line": [[7, "line"]], "Loading from Huggingface Hub": [[14, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[12, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[12, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[4, "main-features"]], "Model optimization": [[17, "model-optimization"]], "Model zoo": [[4, "model-zoo"]], "Modifying the documentation": [[2, "modifying-the-documentation"]], "Naming conventions": [[14, "naming-conventions"]], "OCR": [[16, "ocr"]], "Object Detection": [[16, "object-detection"]], "Our Pledge": [[1, "our-pledge"]], "Our Standards": [[1, "our-standards"]], "Page": [[7, "page"]], "Preparing your model for inference": [[17, null]], "Prerequisites": [[3, "prerequisites"]], "Pretrained community models": [[14, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[14, "pushing-to-the-huggingface-hub"]], "Questions": [[2, "questions"]], "Recognition": [[14, "recognition"], [16, "recognition"]], "Recognition predictors": [[18, "recognition-predictors"]], "Returns:": [[6, "returns"], [7, "returns"], [7, "id11"], [7, "id13"], [7, "id15"], [7, "id19"], [7, "id23"], [7, "id27"], [7, "id31"], [8, "returns"], [8, "id6"], [8, "id11"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id29"], [8, "id34"], [8, "id39"], [8, "id44"], [8, "id49"], [8, "id53"], [8, "id57"], [8, "id62"], [8, "id66"], [8, "id71"], [8, "id76"], [8, "id80"], [8, "id84"], [8, "id88"], [8, "id93"], [8, "id98"], [8, "id102"], [8, "id107"], [8, "id112"], [8, "id117"], [8, "id122"], [8, "id126"], [8, "id130"], [8, "id135"], [8, "id140"], [8, "id145"], [8, "id149"], [8, "id153"], [8, "id158"], [8, "id162"], [8, "id166"], [8, "id168"], [8, "id170"], [8, "id172"], [10, "returns"]], "Scope": [[1, "scope"]], "Share your model with the community": [[14, null]], "Supported Vocabs": [[6, "supported-vocabs"]], "Supported contribution modules": [[5, "supported-contribution-modules"]], "Supported datasets": [[4, "supported-datasets"]], "Supported transformations": [[9, "supported-transformations"]], "Synthetic dataset generator": [[6, "synthetic-dataset-generator"], [16, "synthetic-dataset-generator"]], "Task evaluation": [[10, "task-evaluation"]], "Text Detection": [[18, "text-detection"]], "Text Recognition": [[18, "text-recognition"]], "Text detection models": [[4, "text-detection-models"]], "Text recognition models": [[4, "text-recognition-models"]], "Train your own model": [[12, null]], "Two-stage approaches": [[18, "two-stage-approaches"]], "Unit tests": [[2, "unit-tests"]], "Use your own datasets": [[16, "use-your-own-datasets"]], "Using your ONNX exported model": [[17, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[3, "via-conda-only-for-linux"]], "Via Git": [[3, "via-git"]], "Via Python Package": [[3, "via-python-package"]], "Visualization": [[10, "visualization"]], "What should I do with the output?": [[18, "what-should-i-do-with-the-output"]], "Word": [[7, "word"]], "docTR Notebooks": [[11, null]], "docTR Vocabs": [[6, "id62"]], "docTR: Document Text Recognition": [[4, null]], "doctr.contrib": [[5, null]], "doctr.datasets": [[6, null], [6, "datasets"]], "doctr.io": [[7, null]], "doctr.models": [[8, null]], "doctr.models.classification": [[8, "doctr-models-classification"]], "doctr.models.detection": [[8, "doctr-models-detection"]], "doctr.models.factory": [[8, "doctr-models-factory"]], "doctr.models.recognition": [[8, "doctr-models-recognition"]], "doctr.models.zoo": [[8, "doctr-models-zoo"]], "doctr.transforms": [[9, null]], "doctr.utils": [[10, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[7, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[7, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[9, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[6, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[9, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[9, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[6, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[6, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[8, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[6, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[6, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[7, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[7, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[6, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[6, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[9, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[9, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[6, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[6, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[6, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[6, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[6, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[8, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[9, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[7, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[6, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[9, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[8, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[6, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[9, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[7, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[9, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[9, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[9, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[9, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[9, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[9, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[9, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[9, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[9, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[9, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[9, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[9, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[7, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[7, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[7, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[6, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[9, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[7, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[7, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[6, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[6, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[6, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[6, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[9, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[10, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[6, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[7, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[6, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[6, 0, 1, "", "CORD"], [6, 0, 1, "", "CharacterGenerator"], [6, 0, 1, "", "DetectionDataset"], [6, 0, 1, "", "DocArtefacts"], [6, 0, 1, "", "FUNSD"], [6, 0, 1, "", "IC03"], [6, 0, 1, "", "IC13"], [6, 0, 1, "", "IIIT5K"], [6, 0, 1, "", "IIITHWS"], [6, 0, 1, "", "IMGUR5K"], [6, 0, 1, "", "MJSynth"], [6, 0, 1, "", "OCRDataset"], [6, 0, 1, "", "RecognitionDataset"], [6, 0, 1, "", "SROIE"], [6, 0, 1, "", "SVHN"], [6, 0, 1, "", "SVT"], [6, 0, 1, "", "SynthText"], [6, 0, 1, "", "WILDRECEIPT"], [6, 0, 1, "", "WordGenerator"], [6, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[6, 0, 1, "", "DataLoader"]], "doctr.io": [[7, 0, 1, "", "Artefact"], [7, 0, 1, "", "Block"], [7, 0, 1, "", "Document"], [7, 0, 1, "", "DocumentFile"], [7, 0, 1, "", "Line"], [7, 0, 1, "", "Page"], [7, 0, 1, "", "Word"], [7, 1, 1, "", "decode_img_as_tensor"], [7, 1, 1, "", "read_html"], [7, 1, 1, "", "read_img_as_numpy"], [7, 1, 1, "", "read_img_as_tensor"], [7, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[7, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[7, 2, 1, "", "from_images"], [7, 2, 1, "", "from_pdf"], [7, 2, 1, "", "from_url"]], "doctr.io.Page": [[7, 2, 1, "", "show"]], "doctr.models": [[8, 1, 1, "", "kie_predictor"], [8, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[8, 1, 1, "", "crop_orientation_predictor"], [8, 1, 1, "", "magc_resnet31"], [8, 1, 1, "", "mobilenet_v3_large"], [8, 1, 1, "", "mobilenet_v3_large_r"], [8, 1, 1, "", "mobilenet_v3_small"], [8, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [8, 1, 1, "", "mobilenet_v3_small_page_orientation"], [8, 1, 1, "", "mobilenet_v3_small_r"], [8, 1, 1, "", "page_orientation_predictor"], [8, 1, 1, "", "resnet18"], [8, 1, 1, "", "resnet31"], [8, 1, 1, "", "resnet34"], [8, 1, 1, "", "resnet50"], [8, 1, 1, "", "textnet_base"], [8, 1, 1, "", "textnet_small"], [8, 1, 1, "", "textnet_tiny"], [8, 1, 1, "", "vgg16_bn_r"], [8, 1, 1, "", "vit_b"], [8, 1, 1, "", "vit_s"]], "doctr.models.detection": [[8, 1, 1, "", "db_mobilenet_v3_large"], [8, 1, 1, "", "db_resnet50"], [8, 1, 1, "", "detection_predictor"], [8, 1, 1, "", "fast_base"], [8, 1, 1, "", "fast_small"], [8, 1, 1, "", "fast_tiny"], [8, 1, 1, "", "linknet_resnet18"], [8, 1, 1, "", "linknet_resnet34"], [8, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[8, 1, 1, "", "from_hub"], [8, 1, 1, "", "login_to_hub"], [8, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[8, 1, 1, "", "crnn_mobilenet_v3_large"], [8, 1, 1, "", "crnn_mobilenet_v3_small"], [8, 1, 1, "", "crnn_vgg16_bn"], [8, 1, 1, "", "master"], [8, 1, 1, "", "parseq"], [8, 1, 1, "", "recognition_predictor"], [8, 1, 1, "", "sar_resnet31"], [8, 1, 1, "", "vitstr_base"], [8, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[9, 0, 1, "", "ChannelShuffle"], [9, 0, 1, "", "ColorInversion"], [9, 0, 1, "", "Compose"], [9, 0, 1, "", "GaussianBlur"], [9, 0, 1, "", "GaussianNoise"], [9, 0, 1, "", "LambdaTransformation"], [9, 0, 1, "", "Normalize"], [9, 0, 1, "", "OneOf"], [9, 0, 1, "", "RandomApply"], [9, 0, 1, "", "RandomBrightness"], [9, 0, 1, "", "RandomContrast"], [9, 0, 1, "", "RandomCrop"], [9, 0, 1, "", "RandomGamma"], [9, 0, 1, "", "RandomHorizontalFlip"], [9, 0, 1, "", "RandomHue"], [9, 0, 1, "", "RandomJpegQuality"], [9, 0, 1, "", "RandomResize"], [9, 0, 1, "", "RandomRotate"], [9, 0, 1, "", "RandomSaturation"], [9, 0, 1, "", "RandomShadow"], [9, 0, 1, "", "Resize"], [9, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[10, 0, 1, "", "DetectionMetric"], [10, 0, 1, "", "LocalizationConfusion"], [10, 0, 1, "", "OCRMetric"], [10, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.visualization": [[10, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [1, 7, 8, 10, 14, 17], "0": [1, 3, 6, 9, 10, 12, 15, 16, 18], "00": 18, "01": 18, "0123456789": 6, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "02562": 8, "03": 18, "035": 18, "0361328125": 18, "04": 18, "05": 18, "06": 18, "06640625": 18, "07": 18, "08": [9, 18], "09": 18, "0966796875": 18, "1": [6, 7, 8, 9, 10, 12, 16, 18], "10": [6, 10, 18], "100": [6, 9, 10, 16, 18], "1000": 18, "101": 6, "1024": [8, 12, 18], "104": 6, "106": 6, "108": 6, "1095": 16, "11": 18, "110": 10, "1107": 16, "114": 6, "115": 6, "1156": 16, "116": 6, "118": 6, "11800h": 18, "11th": 18, "12": 18, "120": 6, "123": 6, "126": 6, "1268": 16, "128": [8, 12, 17, 18], "13": 18, "130": 6, "13068": 16, "131": 6, "1337891": 16, "1357421875": 18, "1396484375": 18, "14": 18, "1420": 18, "14470v1": 6, "149": 16, "15": 18, "150": [10, 18], "1552": 18, "16": [8, 17, 18], "1630859375": 18, "1684": 18, "16x16": 8, "17": 18, "1778": 18, "1782": 18, "18": [8, 18], "185546875": 18, "1900": 18, "1910": 8, "19342": 16, "19370": 16, "195": 6, "19598": 16, "199": 18, "1999": 18, "2": [3, 4, 6, 7, 9, 15, 18], "20": 18, "200": 10, "2000": 16, "2003": [4, 6], "2012": 6, "2013": [4, 6], "2015": 6, "2019": 4, "207901": 16, "21": 18, "2103": 6, "2186": 16, "21888": 16, "22": 18, "224": [8, 9], "225": 9, "22672": 16, "229": [9, 16], "23": 18, "233": 16, "236": 6, "24": 18, "246": 16, "249": 16, "25": 18, "2504": 18, "255": [7, 8, 9, 10, 18], "256": 8, "257": 16, "26": 18, "26032": 16, "264": 12, "27": 18, "2700": 16, "2710": 18, "2749": 12, "28": 18, "287": 12, "29": 18, "296": 12, "299": 12, "2d": 18, "3": [3, 4, 7, 8, 9, 10, 17, 18], "30": 18, "300": 16, "3000": 16, "301": 12, "30595": 18, "30ghz": 18, "31": 8, "32": [6, 8, 9, 12, 16, 17, 18], "3232421875": 18, "33": [9, 18], "33402": 16, "33608": 16, "34": [8, 18], "340": 18, "3456": 18, "3515625": 18, "36": 18, "360": 16, "37": [6, 18], "38": 18, "39": 18, "4": [8, 9, 10, 18], "40": 18, "406": 9, "41": 18, "42": 18, "43": 18, "44": 18, "45": 18, "456": 9, "46": 18, "47": 18, "472": 16, "48": [6, 18], "485": 9, "49": 18, "49377": 16, "5": [6, 9, 10, 15, 18], "50": [8, 16, 18], "51": 18, "51171875": 18, "512": 8, "52": [6, 18], "529": 18, "53": 18, "54": 18, "540": 18, "5478515625": 18, "55": 18, "56": 18, "57": 18, "58": [6, 18], "580": 18, "5810546875": 18, "583": 18, "59": 18, "597": 18, "5k": [4, 6], "5m": 18, "6": [9, 18], "60": 9, "600": [8, 10, 18], "61": 18, "62": 18, "626": 16, "63": 18, "64": [8, 9, 18], "641": 18, "647": 16, "65": 18, "66": 18, "67": 18, "68": 18, "69": 18, "693": 12, "694": 12, "695": 12, "6m": 18, "7": 18, "70": [6, 10, 18], "707470": 16, "71": [6, 18], "7100000": 16, "7141797": 16, "7149": 16, "72": 18, "72dpi": 7, "73": 18, "73257": 16, "74": 18, "75": [9, 18], "7581382": 16, "76": 18, "77": 18, "772": 12, "772875": 16, "78": 18, "785": 12, "79": 18, "793533": 16, "796": 16, "798": 12, "7m": 18, "8": [8, 9, 18], "80": 18, "800": [8, 10, 16, 18], "81": 18, "82": 18, "83": 18, "84": 18, "849": 16, "85": 18, "8564453125": 18, "857": 18, "85875": 16, "86": 18, "8603515625": 18, "87": 18, "8707": 16, "88": 18, "89": 18, "9": [3, 9, 18], "90": 18, "90k": 6, "90kdict32px": 6, "91": 18, "914085328578949": 18, "92": 18, "93": 18, "94": [6, 18], "95": [10, 18], "9578408598899841": 18, "96": 18, "97": 18, "98": 18, "99": 18, "9949972033500671": 18, "A": [1, 2, 4, 6, 7, 8, 11, 17], "As": 2, "Be": 18, "Being": 1, "By": 13, "For": [1, 2, 3, 12, 18], "If": [2, 7, 8, 12, 18], "In": [2, 6, 16], "It": [9, 14, 15, 17], "Its": [4, 8], "No": [1, 18], "Of": 6, "Or": [15, 17], "The": [1, 2, 6, 7, 10, 13, 15, 16, 17, 18], "Then": 8, "To": [2, 3, 13, 14, 15, 17, 18], "_": [1, 6, 8], "__call__": 18, "_build": 2, "_i": 10, "ab": 6, "abc": 17, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "abdef": [6, 16], "abl": [16, 18], "about": [1, 16, 18], "abov": 18, "abstractdataset": 6, "abus": 1, "accept": 1, "access": [4, 7, 16, 18], "account": [1, 14], "accur": 18, "accuraci": 10, "achiev": 17, "act": 1, "action": 1, "activ": 4, "ad": [2, 8, 9], "adapt": 1, "add": [9, 10, 14, 18], "add_hook": 18, "add_label": 10, "addit": [2, 3, 7, 15, 18], "addition": [2, 18], "address": [1, 7], "adjust": 9, "advanc": 1, "advantag": 17, "advis": 2, "aesthet": [4, 6], "affect": 1, "after": [14, 18], "ag": 1, "again": 8, "aggreg": [10, 16], "aggress": 1, "align": [1, 7, 9], "all": [1, 2, 5, 6, 7, 9, 10, 15, 16, 18], "allow": [1, 17], "along": 18, "alreadi": [2, 17], "also": [1, 8, 14, 15, 16, 18], "alwai": 16, "an": [1, 2, 4, 6, 7, 8, 10, 15, 17, 18], "analysi": [7, 15], "ancient_greek": 6, "angl": [7, 9], "ani": [1, 6, 7, 8, 9, 10, 17, 18], "annot": 6, "anot": 16, "anoth": [8, 12, 16], "answer": 1, "anyascii": 10, "anyon": 4, "anyth": 15, "api": [2, 4], "apolog": 1, "apologi": 1, "app": 2, "appear": 1, "appli": [1, 6, 9], "applic": [4, 8], "appoint": 1, "appreci": 14, "appropri": [1, 2, 18], "ar": [1, 2, 3, 5, 6, 7, 9, 10, 11, 15, 16, 18], "arab": 6, "arabic_diacrit": 6, "arabic_lett": 6, "arabic_punctu": 6, "arbitrarili": [4, 8], "arch": [8, 14], "architectur": [4, 8, 14, 15], "area": 18, "argument": [6, 7, 8, 10, 12, 18], "around": 1, "arrai": [7, 9, 10], "art": [4, 15], "artefact": [10, 15, 18], "artefact_typ": 7, "artifici": [4, 6], "arxiv": [6, 8], "asarrai": 10, "ascii_lett": 6, "aspect": [4, 8, 9, 18], "assess": 10, "assign": 10, "associ": 7, "assum": 8, "assume_straight_pag": [8, 12, 18], "astyp": [8, 10, 18], "attack": 1, "attend": [4, 8], "attent": [1, 8], "autom": 4, "automat": 18, "autoregress": [4, 8], "avail": [1, 4, 5, 9], "averag": [9, 18], "avoid": [1, 3], "aw": [4, 18], "awar": 18, "azur": 18, "b": [8, 10, 18], "b_j": 10, "back": 2, "backbon": 8, "backend": 18, "background": 16, "bangla": 6, "bar": 15, "bar_cod": 16, "base": [4, 8, 15], "baselin": [4, 8, 18], "batch": [6, 8, 9, 15, 16, 18], "batch_siz": [6, 12, 15, 16, 17], "bblanchon": 3, "bbox": 18, "becaus": 13, "been": [2, 10, 16, 18], "befor": [6, 8, 9, 18], "begin": 10, "behavior": [1, 18], "being": [10, 18], "belong": 18, "benchmark": 18, "best": 1, "better": [11, 18], "between": [9, 10, 18], "bgr": 7, "bilinear": 9, "bin_thresh": 18, "binar": [4, 8, 18], "binari": [7, 17, 18], "bit": 17, "block": [10, 18], "block_1_1": 18, "blur": 9, "bmvc": 6, "bn": 14, "bodi": [1, 18], "bool": [6, 7, 8, 9, 10], "boolean": [8, 18], "both": [4, 6, 9, 16, 18], "bottom": [8, 18], "bound": [6, 7, 8, 9, 10, 15, 16, 18], "box": [6, 7, 8, 9, 10, 15, 16, 18], "box_thresh": 18, "bright": 9, "browser": [2, 4], "build": [2, 3, 17], "built": 2, "byte": [7, 18], "c": [3, 7, 10], "c_j": 10, "cach": [2, 6, 13], "cache_sampl": 6, "call": 17, "callabl": [6, 9], "can": [2, 3, 12, 13, 14, 15, 16, 18], "capabl": [2, 11, 18], "case": [6, 10], "cf": 18, "cfg": 18, "challeng": 6, "challenge2_test_task12_imag": 6, "challenge2_test_task1_gt": 6, "challenge2_training_task12_imag": 6, "challenge2_training_task1_gt": 6, "chang": [13, 18], "channel": [1, 2, 7, 9], "channel_prior": 3, "channelshuffl": 9, "charact": [4, 6, 7, 10, 16, 18], "charactergener": [6, 16], "characterist": 1, "charg": 18, "charset": 18, "chart": 7, "check": [2, 14, 18], "checkpoint": 8, "chip": 3, "ci": 2, "clarifi": 1, "clariti": 1, "class": [1, 6, 7, 9, 10, 18], "class_nam": 12, "classif": [16, 18], "classmethod": 7, "clear": 2, "clone": 3, "close": 2, "co": 14, "code": [4, 7, 15], "codecov": 2, "colab": 11, "collate_fn": 6, "collect": [7, 15], "color": 9, "colorinvers": 9, "column": 7, "com": [1, 3, 7, 8, 14], "combin": 18, "command": [2, 15], "comment": 1, "commit": 1, "common": [1, 9, 10, 17], "commun": 1, "compar": 4, "comparison": [10, 18], "competit": 6, "compil": [11, 18], "complaint": 1, "complementari": 10, "complet": 2, "compon": 18, "compos": [6, 18], "comprehens": 18, "comput": [6, 10, 17, 18], "conf_threshold": 15, "confid": [7, 18], "config": [3, 8], "configur": 8, "confus": 10, "consecut": [9, 18], "consequ": 1, "consid": [1, 2, 6, 7, 10, 18], "consist": 18, "consolid": [4, 6], "constant": 9, "construct": 1, "contact": 1, "contain": [5, 6, 11, 16, 18], "content": [6, 7, 18], "context": 8, "contib": 3, "continu": 1, "contrast": 9, "contrast_factor": 9, "contrib": [3, 15], "contribut": 1, "contributor": 2, "convers": 7, "convert": [7, 9], "convolut": 8, "coordin": [7, 18], "cord": [4, 6, 16, 18], "core": [10, 18], "corner": 18, "correct": 9, "correspond": [3, 7, 9, 18], "could": [1, 15], "counterpart": 10, "cover": 2, "coverag": 2, "cpu": [4, 12, 17], "creat": 14, "crnn": [4, 8, 14], "crnn_mobilenet_v3_larg": [8, 14, 18], "crnn_mobilenet_v3_smal": [8, 17, 18], "crnn_vgg16_bn": [8, 12, 14, 18], "crop": [7, 8, 9, 12, 16, 18], "crop_orient": [7, 18], "crop_orientation_predictor": [8, 12], "crop_param": 12, "cuda": 17, "currenc": 6, "current": [2, 12, 18], "custom": [14, 15, 17, 18], "custom_crop_orientation_model": 12, "custom_page_orientation_model": 12, "customhook": 18, "cvit": 4, "czczup": 8, "czech": 6, "d": [6, 16], "danish": 6, "data": [4, 6, 7, 9, 10, 12, 14], "dataload": 16, "dataset": [8, 12, 18], "dataset_info": 6, "date": [12, 18], "db": 14, "db_mobilenet_v3_larg": [8, 14, 18], "db_resnet34": 18, "db_resnet50": [8, 12, 14, 18], "dbnet": [4, 8], "deal": [11, 18], "decis": 1, "decod": 7, "decode_img_as_tensor": 7, "dedic": 17, "deem": 1, "deep": [8, 18], "def": 18, "default": [3, 7, 12, 13, 18], "defer": 16, "defin": [10, 17], "degre": [7, 9, 18], "degress": 7, "delet": 2, "delimit": 18, "delta": 9, "demo": [2, 4], "demonstr": 1, "depend": [2, 3, 4, 18], "deploi": 2, "deploy": 4, "derogatori": 1, "describ": 8, "descript": 11, "design": 9, "desir": 7, "det_arch": [8, 12, 14, 17], "det_b": 18, "det_model": [12, 14, 17], "det_param": 12, "det_predictor": [12, 18], "detail": [12, 18], "detect": [6, 7, 10, 11, 12, 15], "detect_languag": 8, "detect_orient": [8, 12, 18], "detection_predictor": [8, 18], "detection_task": [6, 16], "detectiondataset": [6, 16], "detectionmetr": 10, "detectionpredictor": [8, 12], "detector": [4, 8, 15], "deterior": 8, "determin": 1, "dev": [2, 13], "develop": 3, "deviat": 9, "devic": 17, "dict": [7, 10, 18], "dictionari": [7, 10], "differ": 1, "differenti": [4, 8], "digit": [4, 6, 16], "dimens": [7, 10, 18], "dimension": 9, "direct": 6, "directli": [14, 18], "directori": [2, 13], "disabl": [1, 13, 18], "disable_crop_orient": 18, "disable_page_orient": 18, "disclaim": 18, "discuss": 2, "disparag": 1, "displai": [7, 10], "display_artefact": 10, "distribut": 9, "div": 18, "divers": 1, "divid": 7, "do": [2, 3, 8], "doc": [2, 7, 15, 17, 18], "docartefact": [6, 16], "docstr": 2, "doctr": [3, 12, 13, 14, 15, 16, 17, 18], "doctr_cache_dir": 13, "doctr_multiprocessing_dis": 13, "document": [6, 8, 10, 11, 12, 15, 16, 17, 18], "documentbuild": 18, "documentfil": [7, 12, 14, 15, 17], "doesn": 17, "don": [12, 18], "done": 9, "download": [6, 16], "downsiz": 8, "draw": 9, "drop": 6, "drop_last": 6, "dtype": [7, 8, 9, 10, 17], "dual": [4, 6], "dummi": 14, "dummy_img": 18, "dummy_input": 17, "dure": 1, "dutch": 6, "dynam": [6, 15], "dynamic_seq_length": 6, "e": [1, 2, 3, 7, 8], "each": [4, 6, 7, 8, 9, 10, 16, 18], "eas": 2, "easi": [4, 10, 14, 17], "easili": [7, 10, 12, 14, 16, 18], "econom": 1, "edit": 1, "educ": 1, "effect": 18, "effici": [2, 4, 6, 8], "either": [10, 18], "element": [6, 7, 8, 18], "els": [2, 15], "email": 1, "empathi": 1, "en": 18, "enabl": [6, 7], "enclos": 7, "encod": [4, 6, 7, 8, 18], "encode_sequ": 6, "encount": 2, "encrypt": 7, "end": [4, 6, 8, 10], "english": [6, 16], "enough": [2, 18], "ensur": 2, "entri": 6, "environ": [1, 13], "eo": 6, "equiv": 18, "estim": 8, "etc": [7, 15], "ethnic": 1, "evalu": [16, 18], "event": 1, "everyon": 1, "everyth": [2, 18], "exact": [10, 18], "exampl": [1, 2, 4, 6, 8, 14, 18], "exchang": 17, "execut": 18, "exist": 14, "expand": 9, "expect": [7, 9, 10], "experi": 1, "explan": [1, 18], "explicit": 1, "exploit": [4, 8], "export": [7, 8, 10, 11, 15, 18], "export_as_straight_box": [8, 18], "export_as_xml": 18, "export_model_to_onnx": 17, "express": [1, 9], "extens": 7, "extern": [1, 16], "extract": [4, 6], "extractor": 8, "f_": 10, "f_a": 10, "factor": 9, "fair": 1, "fairli": 1, "fals": [6, 7, 8, 9, 10, 12, 18], "faq": 1, "fascan": 14, "fast": [4, 6, 8], "fast_bas": [8, 18], "fast_smal": [8, 18], "fast_tini": [8, 18], "faster": [4, 8, 17], "fasterrcnn_mobilenet_v3_large_fpn": 8, "favorit": 18, "featur": [3, 8, 10, 11, 12, 15], "feedback": 1, "feel": [2, 14], "felix92": 14, "few": [17, 18], "figsiz": 10, "figur": [10, 15], "file": [2, 6], "final": 8, "find": [2, 16], "finnish": 6, "first": [2, 6], "firsthand": 6, "fit": [8, 18], "flag": 18, "flip": 9, "float": [7, 9, 10, 17], "float32": [7, 8, 9, 17], "fn": 9, "focu": 14, "focus": [1, 6], "folder": 6, "follow": [1, 2, 3, 6, 9, 10, 12, 13, 14, 15, 18], "font": 6, "font_famili": 6, "foral": 10, "forc": 2, "forg": 3, "form": [4, 6, 18], "format": [7, 10, 12, 16, 17, 18], "forpost": [4, 6], "forum": 2, "fp16": 17, "frac": 10, "framework": [3, 14, 16, 18], "free": [1, 2, 14], "french": [6, 12, 14, 18], "friendli": 4, "from": [1, 4, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18], "from_hub": [8, 14], "from_imag": [7, 14, 15, 17], "from_pdf": 7, "from_url": 7, "full": [6, 10, 18], "function": [6, 9, 10, 15], "funsd": [4, 6, 16, 18], "further": 16, "futur": 6, "g": [7, 8], "g_": 10, "g_x": 10, "gamma": 9, "gaussian": 9, "gaussianblur": 9, "gaussiannois": 9, "gen": 18, "gender": 1, "gener": [2, 4, 7, 8], "generic_cyrillic_lett": 6, "geometri": [4, 7, 18], "geq": 10, "german": [6, 12, 14], "get": [17, 18], "git": 14, "github": [2, 3, 8, 14], "give": [1, 15], "given": [6, 7, 9, 10, 18], "global": 8, "go": 18, "good": 17, "googl": 2, "googlevis": 4, "gpu": [4, 15, 17], "gracefulli": 1, "graph": [4, 6, 7], "grayscal": 9, "ground": 10, "groung": 10, "group": [4, 18], "gt": 10, "gt_box": 10, "gt_label": 10, "guid": 2, "guidanc": 16, "gvision": 18, "h": [7, 8, 9], "h_": 10, "ha": [2, 6, 10, 16], "handl": [11, 16, 18], "handwrit": 6, "handwritten": 16, "harass": 1, "hardwar": 18, "harm": 1, "hat": 10, "have": [1, 2, 10, 12, 14, 16, 17, 18], "head": [8, 18], "healthi": 1, "hebrew": 6, "height": [7, 9], "hello": [10, 18], "help": 17, "here": [5, 9, 11, 15, 16, 18], "hf": 8, "hf_hub_download": 8, "high": 7, "higher": [3, 6, 18], "hindi": 6, "hindi_digit": 6, "hocr": 18, "hook": 18, "horizont": [7, 9, 18], "hous": 6, "how": [2, 11, 12, 14, 16], "howev": 16, "hsv": 9, "html": [1, 2, 3, 7, 18], "http": [1, 3, 6, 7, 8, 14, 18], "hub": 8, "hue": 9, "huggingfac": 8, "hw": 6, "i": [1, 2, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17], "i7": 18, "ic03": [4, 6, 16], "ic13": [4, 6, 16], "icdar": [4, 6], "icdar2019": 6, "id": 18, "ident": 1, "identifi": 4, "iiit": [4, 6], "iiit5k": [6, 16], "iiithw": [4, 6, 16], "imag": [4, 6, 7, 8, 9, 10, 14, 15, 16, 18], "imagenet": 8, "imageri": 1, "images_90k_norm": 6, "img": [6, 9, 16, 17], "img_cont": 7, "img_fold": [6, 16], "img_path": 7, "img_transform": 6, "imgur5k": [4, 6, 16], "imgur5k_annot": 6, "imlist": 6, "impact": 1, "implement": [6, 7, 8, 9, 10, 18], "import": [6, 7, 8, 9, 10, 12, 14, 15, 16, 17, 18], "improv": 8, "inappropri": 1, "incid": 1, "includ": [1, 6, 16, 17], "inclus": 1, "increas": 9, "independ": 9, "index": [2, 7], "indic": 10, "individu": 1, "infer": [4, 8, 9, 15, 18], "inform": [1, 2, 4, 6, 16], "input": [2, 7, 8, 9, 17, 18], "input_crop": 8, "input_pag": [8, 10, 18], "input_shap": 17, "input_tensor": 8, "inspir": [1, 9], "instal": [14, 15, 17], "instanc": [1, 18], "instanti": [8, 18], "instead": [6, 7, 8], "insult": 1, "int": [6, 7, 9], "int64": 10, "integ": 10, "integr": [4, 14, 16], "intel": 18, "interact": [1, 7, 10], "interfac": [14, 17], "interoper": 17, "interpol": 9, "interpret": [6, 7], "intersect": 10, "invert": 9, "investig": 1, "invis": 1, "involv": [1, 18], "io": [12, 14, 15, 17], "iou": 10, "iou_thresh": 10, "iou_threshold": 15, "irregular": [4, 8, 16], "isn": 6, "issu": [1, 2, 14], "italian": 6, "iter": [6, 9, 16, 18], "its": [7, 8, 9, 10, 16, 18], "itself": [8, 14], "j": 10, "job": 2, "join": 2, "jpeg": 9, "jpegqual": 9, "jpg": [6, 7, 14, 17], "json": [6, 16, 18], "json_output": 18, "jump": 2, "just": 1, "kei": [4, 6], "kera": [8, 17], "kernel": [4, 8, 9], "kernel_shap": 9, "keywoard": 8, "keyword": [6, 7, 8, 10], "kie": [8, 12], "kie_predictor": [8, 12], "kiepredictor": 8, "kind": 1, "know": [2, 17], "kwarg": [6, 7, 8, 10], "l": 10, "l_j": 10, "label": [6, 10, 15, 16], "label_fil": [6, 16], "label_fold": 6, "label_path": [6, 16], "labels_path": [6, 16], "ladder": 1, "lambda": 9, "lambdatransform": 9, "lang": 18, "languag": [1, 4, 6, 7, 8, 14, 18], "larg": [8, 14], "largest": 10, "last": [3, 6], "latenc": 8, "later": 2, "latest": 18, "latin": 6, "layer": 17, "layout": 18, "lead": 1, "leader": 1, "learn": [1, 4, 8, 17, 18], "least": 3, "left": [10, 18], "legacy_french": 6, "length": [6, 18], "less": [17, 18], "level": [1, 6, 10, 18], "leverag": 11, "lf": 14, "librari": [2, 3, 11, 12], "light": 4, "lightweight": 17, "like": 1, "limits_": 10, "line": [4, 8, 10, 18], "line_1_1": 18, "link": 12, "linknet": [4, 8], "linknet_resnet18": [8, 12, 17, 18], "linknet_resnet34": [8, 17, 18], "linknet_resnet50": [8, 18], "list": [6, 7, 9, 10, 14], "ll": 10, "load": [4, 6, 8, 15, 17], "load_state_dict": 12, "load_weight": 12, "loc_pr": 18, "local": [2, 4, 6, 8, 10, 16, 18], "localis": 6, "localizationconfus": 10, "locat": [2, 7, 18], "login": 8, "login_to_hub": [8, 14], "logo": [7, 15, 16], "love": 14, "lower": [9, 10, 18], "m": [2, 10, 18], "m1": 3, "macbook": 3, "machin": 17, "made": 4, "magc_resnet31": 8, "mai": [1, 2], "mail": 1, "main": 11, "maintain": 4, "mainten": 2, "make": [1, 2, 10, 12, 13, 14, 17, 18], "mani": [16, 18], "manipul": 18, "map": [6, 8], "map_loc": 12, "master": [4, 8, 18], "match": [10, 18], "mathcal": 10, "matplotlib": [7, 10], "max": [6, 9, 10], "max_angl": 9, "max_area": 9, "max_char": [6, 16], "max_delta": 9, "max_gain": 9, "max_gamma": 9, "max_qual": 9, "max_ratio": 9, "maximum": [6, 9], "maxval": [8, 9], "mbox": 10, "mean": [9, 10, 12], "meaniou": 10, "meant": [7, 17], "measur": 18, "media": 1, "median": 8, "meet": 12, "member": 1, "memori": [13, 17], "mention": 18, "merg": 6, "messag": 2, "meta": 18, "metadata": 17, "metal": 3, "method": [7, 9, 18], "metric": [10, 18], "middl": 18, "might": [17, 18], "min": 9, "min_area": 9, "min_char": [6, 16], "min_gain": 9, "min_gamma": 9, "min_qual": 9, "min_ratio": 9, "min_val": 9, "minde": [1, 3, 4, 8], "minim": [2, 4], "minimalist": [4, 8], "minimum": [3, 6, 9, 10, 18], "minval": 9, "miss": 3, "mistak": 1, "mixed_float16": 17, "mixed_precis": 17, "mjsynth": [4, 6, 16], "mnt": 6, "mobilenet": [8, 14], "mobilenet_v3_larg": 8, "mobilenet_v3_large_r": 8, "mobilenet_v3_smal": [8, 12], "mobilenet_v3_small_crop_orient": [8, 12], "mobilenet_v3_small_page_orient": [8, 12], "mobilenet_v3_small_r": 8, "mobilenetv3": 8, "modal": [4, 6], "mode": 3, "model": [6, 10, 13, 15, 16], "model_nam": [8, 14, 17], "model_path": [15, 17], "moder": 1, "modif": 2, "modifi": [8, 13, 18], "modul": [3, 7, 8, 9, 10, 18], "more": [2, 16, 18], "most": 18, "mozilla": 1, "multi": [4, 8], "multilingu": [6, 14], "multipl": [6, 7, 9, 18], "multipli": 9, "multiprocess": 13, "my": 8, "my_awesome_model": 14, "my_hook": 18, "n": [6, 10], "name": [6, 8, 17, 18], "nation": 1, "natur": [1, 4, 6], "ndarrai": [6, 7, 9, 10], "necessari": [3, 12, 13], "need": [2, 3, 6, 10, 12, 13, 14, 15, 18], "neg": 9, "nest": 18, "network": [4, 6, 8, 17], "neural": [4, 6, 8, 17], "new": [2, 10], "next": [6, 16], "nois": 9, "noisi": [4, 6], "non": [4, 6, 7, 8, 9, 10], "none": [6, 7, 8, 9, 10, 18], "normal": [8, 9], "norwegian": 6, "note": [0, 2, 6, 8, 12, 14, 15, 17], "now": 2, "np": [8, 9, 10, 18], "num_output_channel": 9, "num_sampl": [6, 16], "number": [6, 9, 10, 18], "numpi": [7, 8, 10, 18], "o": 3, "obb": 15, "obj_detect": 14, "object": [6, 7, 10, 15, 18], "objectness_scor": [7, 18], "oblig": 1, "obtain": 18, "occupi": 17, "ocr": [4, 6, 8, 10, 14], "ocr_carea": 18, "ocr_db_crnn": 10, "ocr_lin": 18, "ocr_pag": 18, "ocr_par": 18, "ocr_predictor": [8, 12, 14, 17, 18], "ocrdataset": [6, 16], "ocrmetr": 10, "ocrpredictor": [8, 12], "ocrx_word": 18, "offens": 1, "offici": [1, 8], "offlin": 1, "offset": 9, "onc": 18, "one": [2, 6, 8, 9, 12, 14, 18], "oneof": 9, "ones": [6, 10], "onli": [2, 8, 9, 10, 12, 14, 16, 17, 18], "onlin": 1, "onnx": 15, "onnxruntim": [15, 17], "onnxtr": 17, "opac": 9, "opacity_rang": 9, "open": [1, 2, 14, 17], "opinion": 1, "optic": [4, 18], "optim": [4, 18], "option": [6, 8, 12], "order": [2, 6, 7, 9], "org": [1, 6, 8, 18], "organ": 7, "orient": [1, 7, 8, 11, 15, 18], "orientationpredictor": 8, "other": [1, 2], "otherwis": [1, 7, 10], "our": [2, 8, 18], "out": [2, 8, 9, 10, 18], "outpout": 18, "output": [7, 9, 17], "output_s": [7, 9], "outsid": 13, "over": [6, 10, 18], "overal": [1, 8], "overlai": 7, "overview": 15, "overwrit": 12, "overwritten": 14, "own": 4, "p": [9, 18], "packag": [2, 4, 10, 13, 15, 16, 17], "pad": [6, 8, 9, 18], "page": [3, 6, 8, 10, 12, 18], "page1": 7, "page2": 7, "page_1": 18, "page_idx": [7, 18], "page_orientation_predictor": [8, 12], "page_param": 12, "pair": 10, "paper": 8, "par_1_1": 18, "paragraph": 18, "paragraph_break": 18, "param": [9, 18], "paramet": [4, 7, 8, 17], "pars": [4, 6], "parseq": [4, 8, 14, 17, 18], "part": [6, 9, 18], "parti": 3, "partial": 18, "particip": 1, "pass": [6, 7, 8, 12, 18], "password": 7, "patch": [8, 10], "path": [6, 7, 15, 16, 17], "path_to_checkpoint": 12, "path_to_custom_model": 17, "path_to_pt": 12, "pattern": 1, "pdf": [7, 8, 11], "pdfpage": 7, "peopl": 1, "per": [9, 18], "perform": [4, 7, 8, 9, 10, 13, 17, 18], "period": 1, "permiss": 1, "permut": [4, 8], "persian_lett": 6, "person": [1, 16], "phase": 18, "photo": 16, "physic": [1, 7], "pick": 9, "pictur": 7, "pip": [2, 3, 15, 17], "pipelin": 18, "pixel": [7, 9, 18], "pleas": 2, "plot": 10, "plt": 10, "plug": 14, "plugin": 3, "png": 7, "point": 17, "polici": 13, "polish": 6, "polit": 1, "polygon": [6, 10, 18], "pool": 8, "portugues": 6, "posit": [1, 10], "possibl": [2, 10, 14, 18], "post": [1, 18], "postprocessor": 18, "potenti": 8, "power": 4, "ppageno": 18, "pre": [2, 8, 17], "precis": [10, 18], "pred": 10, "pred_box": 10, "pred_label": 10, "predefin": 16, "predict": [7, 8, 10, 18], "predictor": [4, 7, 8, 11, 12, 14, 17], "prefer": 16, "preinstal": 3, "preprocessor": [12, 18], "prerequisit": 14, "present": 11, "preserv": [8, 9, 18], "preserve_aspect_ratio": [7, 8, 9, 12, 18], "pretrain": [4, 8, 10, 12, 17, 18], "pretrained_backbon": [8, 12], "print": 18, "prior": 6, "privaci": 1, "privat": 1, "probabl": 9, "problem": 2, "procedur": 9, "process": [2, 4, 7, 12, 18], "processor": 18, "produc": [11, 18], "product": 17, "profession": 1, "project": [2, 16], "promptli": 1, "proper": 2, "properli": 6, "provid": [1, 2, 4, 14, 15, 16, 18], "public": [1, 4], "publicli": 18, "publish": 1, "pull": 14, "punctuat": 6, "pure": 6, "purpos": 2, "push_to_hf_hub": [8, 14], "py": 14, "pypdfium2": [3, 7], "pyplot": [7, 10], "python": [2, 15], "python3": 14, "pytorch": [3, 4, 8, 9, 12, 14, 17, 18], "q": 2, "qr": [7, 15], "qr_code": 16, "qualiti": 9, "question": 1, "quickli": 4, "quicktour": 11, "r": 18, "race": 1, "ramdisk": 6, "rand": [8, 9, 10, 17, 18], "random": [8, 9, 10, 18], "randomappli": 9, "randombright": 9, "randomcontrast": 9, "randomcrop": 9, "randomgamma": 9, "randomhorizontalflip": 9, "randomhu": 9, "randomjpegqu": 9, "randomli": 9, "randomres": 9, "randomrot": 9, "randomsatur": 9, "randomshadow": 9, "rang": 9, "rassi": 14, "ratio": [8, 9, 18], "raw": [7, 10], "re": 17, "read": [4, 6, 8], "read_html": 7, "read_img_as_numpi": 7, "read_img_as_tensor": 7, "read_pdf": 7, "readi": 17, "real": [4, 8, 9], "reason": [1, 4, 6], "rebuild": 2, "rebuilt": 2, "recal": [10, 18], "receipt": [4, 6, 18], "reco_arch": [8, 12, 14, 17], "reco_b": 18, "reco_model": [12, 14, 17], "reco_param": 12, "reco_predictor": 12, "recogn": 18, "recognit": [6, 10, 11, 12], "recognition_predictor": [8, 18], "recognition_task": [6, 16], "recognitiondataset": [6, 16], "recognitionpredictor": [8, 12], "rectangular": 8, "reduc": [3, 9], "refer": [2, 3, 12, 14, 15, 16, 18], "regardless": 1, "region": 18, "regroup": 10, "regular": 16, "reject": 1, "rel": [7, 9, 10, 18], "relat": 7, "releas": [0, 3], "relev": 15, "religion": 1, "remov": 1, "render": [7, 18], "repo": 8, "repo_id": [8, 14], "report": 1, "repositori": [6, 8, 14], "repres": [1, 17, 18], "represent": [4, 8], "request": [1, 14], "requir": [3, 9, 17], "research": 4, "residu": 8, "resiz": [9, 18], "resnet": 8, "resnet18": [8, 14], "resnet31": 8, "resnet34": 8, "resnet50": [8, 14], "resolv": 7, "resolve_block": 18, "resolve_lin": 18, "resourc": 16, "respect": 1, "rest": [2, 9, 10], "restrict": 13, "result": [2, 6, 7, 11, 14, 17, 18], "return": 18, "reusabl": 18, "review": 1, "rgb": [7, 9], "rgb_mode": 7, "rgb_output": 7, "right": [1, 8, 10], "robust": [4, 6], "root": 6, "rotat": [6, 7, 8, 9, 10, 11, 12, 16, 18], "run": [2, 3, 8], "same": [2, 7, 10, 16, 17, 18], "sampl": [6, 16, 18], "sample_transform": 6, "sar": [4, 8], "sar_resnet31": [8, 18], "satur": 9, "save": [8, 16], "scale": [7, 8, 9, 10], "scale_rang": 9, "scan": [4, 6], "scene": [4, 6, 8], "score": [7, 10], "script": [2, 16], "seamless": 4, "seamlessli": [4, 18], "search": 8, "searchabl": 11, "sec": 18, "second": 18, "section": [12, 14, 15, 17, 18], "secur": [1, 13], "see": [1, 2], "seen": 18, "segment": [4, 8, 18], "self": 18, "semant": [4, 8], "send": 18, "sens": 10, "sensit": 16, "separ": 18, "sequenc": [4, 6, 7, 8, 10, 18], "sequenti": [9, 18], "seri": 1, "seriou": 1, "set": [1, 3, 6, 8, 10, 13, 15, 18], "set_global_polici": 17, "sever": [7, 9, 18], "sex": 1, "sexual": 1, "shade": 9, "shape": [4, 7, 8, 9, 10, 18], "share": [13, 16], "shift": 9, "shm": 13, "should": [2, 6, 7, 9, 10], "show": [4, 7, 8, 10, 12, 14, 15], "showcas": [2, 11], "shuffl": [6, 9], "side": 10, "signatur": 7, "signific": 16, "simpl": [4, 8, 17], "simpler": 8, "sinc": [6, 16], "singl": [1, 2, 4, 6], "single_img_doc": 17, "size": [1, 6, 7, 9, 15, 18], "skew": 18, "slack": 2, "slightli": 8, "small": [2, 8, 18], "smallest": 7, "snapshot_download": 8, "snippet": 18, "so": [2, 3, 6, 8, 14, 16], "social": 1, "socio": 1, "some": [3, 11, 14, 16], "someth": 2, "somewher": 2, "sort": 1, "sourc": [6, 7, 8, 9, 10, 14], "space": [1, 18], "span": 18, "spanish": 6, "spatial": [4, 6, 7], "specif": [2, 3, 10, 12, 16, 18], "specifi": [1, 6, 7], "speed": [4, 8, 18], "sphinx": 2, "sroie": [4, 6, 16], "stabl": 3, "stackoverflow": 2, "stage": 4, "standalon": 11, "standard": 9, "start": 6, "state": [4, 10, 15], "static": 10, "statu": 1, "std": [9, 12], "step": 13, "still": 18, "str": [6, 7, 8, 9, 10], "straight": [6, 8, 16, 18], "straighten": 18, "straighten_pag": [8, 12, 18], "straigten_pag": 12, "stream": 7, "street": [4, 6], "strict": 3, "strictli": 10, "string": [6, 7, 10, 18], "strive": 3, "strong": [4, 8], "structur": [17, 18], "subset": [6, 18], "suggest": [2, 14], "sum": 10, "summari": 10, "support": [3, 12, 15, 17, 18], "sustain": 1, "svhn": [4, 6, 16], "svt": [6, 16], "swedish": 6, "symmetr": [8, 9, 18], "symmetric_pad": [8, 9, 18], "synthet": 4, "synthtext": [4, 6, 16], "system": 18, "t": [2, 6, 12, 17, 18], "tabl": [14, 15, 16], "take": [1, 6, 18], "target": [6, 7, 9, 10, 16], "target_s": 6, "task": [4, 6, 8, 14, 16, 18], "task2": 6, "team": 3, "techminde": 3, "templat": [2, 4], "tensor": [6, 7, 9, 18], "tensorflow": [3, 4, 7, 8, 9, 12, 14, 17, 18], "tensorspec": 17, "term": 1, "test": [6, 16], "test_set": 6, "text": [6, 7, 8, 10, 16], "text_output": 18, "textmatch": 10, "textnet": 8, "textnet_bas": 8, "textnet_smal": 8, "textnet_tini": 8, "textract": [4, 18], "textstylebrush": [4, 6], "textual": [4, 6, 7, 8, 18], "tf": [3, 7, 8, 9, 14, 17], "than": [2, 10, 14], "thank": 2, "thei": [1, 10], "them": [6, 18], "thi": [1, 2, 3, 5, 6, 9, 10, 12, 13, 14, 16, 17, 18], "thing": [17, 18], "third": 3, "those": [1, 7, 18], "threaten": 1, "threshold": 18, "through": [1, 9, 15, 16], "tilman": 14, "time": [1, 4, 8, 10, 16], "tini": 8, "titl": [7, 18], "tm": 18, "tmp": 13, "togeth": [2, 7], "tograi": 9, "tool": 16, "top": [10, 17, 18], "topic": 2, "torch": [3, 9, 12, 14, 17], "torchvis": 9, "total": 12, "toward": [1, 3], "train": [2, 6, 8, 9, 14, 15, 16, 17, 18], "train_it": [6, 16], "train_load": [6, 16], "train_pytorch": 14, "train_set": [6, 16], "train_tensorflow": 14, "trainabl": [4, 8], "tranform": 9, "transcrib": 18, "transfer": [4, 6], "transfo": 9, "transform": [4, 6, 8], "translat": 1, "troll": 1, "true": [6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18], "truth": 10, "tune": 17, "tupl": [6, 7, 9, 10], "two": [7, 13], "txt": 6, "type": [7, 10, 14, 17, 18], "typic": 18, "u": [1, 2], "ucsd": 6, "udac": 2, "uint8": [7, 8, 10, 18], "ukrainian": 6, "unaccept": 1, "underli": [16, 18], "underneath": 7, "understand": [4, 6, 18], "uniform": [8, 9], "uniformli": 9, "uninterrupt": [7, 18], "union": 10, "unittest": 2, "unlock": 7, "unoffici": 8, "unprofession": 1, "unsolicit": 1, "unsupervis": 4, "unwelcom": 1, "up": [8, 18], "updat": 10, "upgrad": 2, "upper": [6, 9], "uppercas": 16, "url": 7, "us": [1, 2, 3, 6, 8, 10, 11, 12, 13, 14, 15, 18], "usabl": 18, "usag": [13, 17], "use_polygon": [6, 10, 16], "useabl": 18, "user": [4, 7, 11], "utf": 18, "util": 17, "v1": 14, "v3": [8, 14, 18], "valid": 16, "valu": [2, 7, 9, 18], "valuabl": 4, "variabl": 13, "varieti": 6, "veri": 8, "version": [1, 2, 3, 17, 18], "vgg": 8, "vgg16": 14, "vgg16_bn_r": 8, "via": 1, "vietnames": 6, "view": [4, 6], "viewpoint": 1, "violat": 1, "visibl": 1, "vision": [4, 6, 8], "visiondataset": 6, "visiontransform": 8, "visual": [3, 4, 15], "visualize_pag": 10, "vit_": 8, "vit_b": 8, "vitstr": [4, 8, 17], "vitstr_bas": [8, 18], "vitstr_smal": [8, 12, 17, 18], "viz": 3, "vocab": [12, 14, 16, 17, 18], "vocabulari": [6, 12, 14], "w": [7, 8, 9, 10], "w3": 18, "wa": 1, "wai": [1, 4, 16], "want": [2, 17, 18], "warmup": 18, "wasn": 2, "we": [1, 2, 3, 4, 7, 9, 12, 14, 16, 17, 18], "weasyprint": 7, "web": [2, 7], "websit": 6, "welcom": 1, "well": [1, 17], "were": [1, 7, 18], "what": 1, "when": [1, 2, 8], "whenev": 2, "where": [2, 7, 9, 10], "whether": [2, 6, 7, 9, 10, 16, 18], "which": [1, 8, 13, 15, 16, 18], "whichev": 3, "while": [9, 18], "why": 1, "width": [7, 9], "wiki": 1, "wildreceipt": [4, 6, 16], "window": [8, 10], "wish": 2, "within": 1, "without": [1, 6, 8], "wonder": 2, "word": [4, 6, 8, 10, 18], "word_1_1": 18, "word_1_2": 18, "word_1_3": 18, "wordgener": [6, 16], "words_onli": 10, "work": [12, 13, 18], "workflow": 2, "worklow": 2, "world": [10, 18], "worth": 8, "wrap": 18, "wrapper": [6, 9], "write": 13, "written": [1, 7], "www": [1, 7, 18], "x": [7, 9, 10], "x_ascend": 18, "x_descend": 18, "x_i": 10, "x_size": 18, "x_wconf": 18, "xhtml": 18, "xmax": 7, "xmin": 7, "xml": 18, "xml_bytes_str": 18, "xml_element": 18, "xml_output": 18, "xmln": 18, "y": 10, "y_i": 10, "y_j": 10, "yet": 15, "ymax": 7, "ymin": 7, "yolov8": 15, "you": [2, 3, 6, 7, 8, 12, 13, 14, 15, 16, 17, 18], "your": [2, 4, 7, 10, 18], "yoursit": 7, "zero": [9, 10], "zoo": 12, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 6, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 6, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 6, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 6, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 6, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 6, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 6, "\u00e4\u00f6\u00e4\u00f6": 6, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 6, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 6, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 6, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 6, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 6, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 6, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0905": 6, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 6, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 6, "\u0950": 6, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 6, "\u09bd": 6, "\u09ce": 6, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 6}, "titles": ["Changelog", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 2, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 1], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 1], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 1], "31": 0, "4": [0, 1], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 18, "approach": 18, "architectur": 18, "arg": [6, 7, 8, 9, 10], "artefact": 7, "artefactdetect": 15, "attribut": 1, "avail": [15, 16, 18], "aw": 13, "ban": 1, "block": 7, "bug": 2, "changelog": 0, "choos": [16, 18], "classif": [8, 12, 14], "code": [1, 2], "codebas": 2, "commit": 2, "commun": 14, "compos": 9, "conda": 3, "conduct": 1, "connect": 2, "continu": 2, "contrib": 5, "contribut": [2, 5, 15], "contributor": 1, "convent": 14, "correct": 1, "coven": 1, "custom": [6, 12], "data": 16, "dataload": 6, "dataset": [4, 6, 16], "detect": [4, 8, 14, 16, 18], "develop": 2, "do": 18, "doctr": [2, 4, 5, 6, 7, 8, 9, 10, 11], "document": [2, 4, 7], "end": 18, "enforc": 1, "evalu": 10, "export": 17, "factori": 8, "featur": [2, 4], "feedback": 2, "file": 7, "from": 14, "gener": [6, 16], "git": 3, "guidelin": 1, "half": 17, "hub": 14, "huggingfac": 14, "i": 18, "infer": 17, "instal": [2, 3], "integr": [2, 15], "io": 7, "lambda": 13, "let": 2, "line": 7, "linux": 3, "load": [12, 14, 16], "loader": 6, "main": 4, "mode": 2, "model": [4, 8, 12, 14, 17, 18], "modifi": 2, "modul": [5, 15], "name": 14, "notebook": 11, "object": 16, "ocr": [16, 18], "onli": 3, "onnx": 17, "optim": 17, "option": 18, "orient": 12, "our": 1, "output": 18, "own": [12, 16], "packag": 3, "page": 7, "perman": 1, "pipelin": 15, "pledg": 1, "precis": 17, "predictor": 18, "prepar": 17, "prerequisit": 3, "pretrain": 14, "push": 14, "python": 3, "qualiti": 2, "question": 2, "read": 7, "readi": 16, "recognit": [4, 8, 14, 16, 18], "report": 2, "request": 2, "respons": 1, "return": [6, 7, 8, 10], "right": 18, "scope": 1, "share": 14, "should": 18, "stage": 18, "standard": 1, "structur": [2, 7], "style": 2, "support": [4, 5, 6, 9], "synthet": [6, 16], "task": 10, "temporari": 1, "test": 2, "text": [4, 18], "train": 12, "transform": 9, "two": 18, "unit": 2, "us": [16, 17], "util": 10, "v0": 0, "verif": 2, "via": 3, "visual": 10, "vocab": 6, "warn": 1, "what": 18, "word": 7, "your": [12, 14, 15, 16, 17], "zoo": [4, 8]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[1, "correction"]], "2. Warning": [[1, "warning"]], "3. Temporary Ban": [[1, "temporary-ban"]], "4. Permanent Ban": [[1, "permanent-ban"]], "AWS Lambda": [[13, null]], "Advanced options": [[18, "advanced-options"]], "Args:": [[6, "args"], [6, "id4"], [6, "id7"], [6, "id10"], [6, "id13"], [6, "id16"], [6, "id19"], [6, "id22"], [6, "id25"], [6, "id29"], [6, "id32"], [6, "id37"], [6, "id40"], [6, "id46"], [6, "id49"], [6, "id50"], [6, "id51"], [6, "id54"], [6, "id57"], [6, "id60"], [6, "id61"], [7, "args"], [7, "id2"], [7, "id3"], [7, "id4"], [7, "id5"], [7, "id6"], [7, "id7"], [7, "id10"], [7, "id12"], [7, "id14"], [7, "id16"], [7, "id20"], [7, "id24"], [7, "id28"], [8, "args"], [8, "id3"], [8, "id8"], [8, "id13"], [8, "id17"], [8, "id21"], [8, "id26"], [8, "id31"], [8, "id36"], [8, "id41"], [8, "id46"], [8, "id50"], [8, "id54"], [8, "id59"], [8, "id63"], [8, "id68"], [8, "id73"], [8, "id77"], [8, "id81"], [8, "id85"], [8, "id90"], [8, "id95"], [8, "id99"], [8, "id104"], [8, "id109"], [8, "id114"], [8, "id119"], [8, "id123"], [8, "id127"], [8, "id132"], [8, "id137"], [8, "id142"], [8, "id146"], [8, "id150"], [8, "id155"], [8, "id159"], [8, "id163"], [8, "id167"], [8, "id169"], [8, "id171"], [8, "id173"], [9, "args"], [9, "id1"], [9, "id2"], [9, "id3"], [9, "id4"], [9, "id5"], [9, "id6"], [9, "id7"], [9, "id8"], [9, "id9"], [9, "id10"], [9, "id11"], [9, "id12"], [9, "id13"], [9, "id14"], [9, "id15"], [9, "id16"], [9, "id17"], [9, "id18"], [9, "id19"], [10, "args"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"]], "Artefact": [[7, "artefact"]], "ArtefactDetection": [[15, "artefactdetection"]], "Attribution": [[1, "attribution"]], "Available Datasets": [[16, "available-datasets"]], "Available architectures": [[18, "available-architectures"], [18, "id1"], [18, "id2"]], "Available contribution modules": [[15, "available-contribution-modules"]], "Block": [[7, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[16, null]], "Choosing the right model": [[18, null]], "Classification": [[14, "classification"]], "Code quality": [[2, "code-quality"]], "Code style verification": [[2, "code-style-verification"]], "Codebase structure": [[2, "codebase-structure"]], "Commits": [[2, "commits"]], "Composing transformations": [[9, "composing-transformations"]], "Continuous Integration": [[2, "continuous-integration"]], "Contributing to docTR": [[2, null]], "Contributor Covenant Code of Conduct": [[1, null]], "Custom dataset loader": [[6, "custom-dataset-loader"]], "Custom orientation classification models": [[12, "custom-orientation-classification-models"]], "Data Loading": [[16, "data-loading"]], "Dataloader": [[6, "dataloader"]], "Detection": [[14, "detection"], [16, "detection"]], "Detection predictors": [[18, "detection-predictors"]], "Developer mode installation": [[2, "developer-mode-installation"]], "Developing docTR": [[2, "developing-doctr"]], "Document": [[7, "document"]], "Document structure": [[7, "document-structure"]], "End-to-End OCR": [[18, "end-to-end-ocr"]], "Enforcement": [[1, "enforcement"]], "Enforcement Guidelines": [[1, "enforcement-guidelines"]], "Enforcement Responsibilities": [[1, "enforcement-responsibilities"]], "Export to ONNX": [[17, "export-to-onnx"]], "Feature requests & bug report": [[2, "feature-requests-bug-report"]], "Feedback": [[2, "feedback"]], "File reading": [[7, "file-reading"]], "Half-precision": [[17, "half-precision"]], "Installation": [[3, null]], "Integrate contributions into your pipeline": [[15, null]], "Let\u2019s connect": [[2, "let-s-connect"]], "Line": [[7, "line"]], "Loading from Huggingface Hub": [[14, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[12, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[12, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[4, "main-features"]], "Model optimization": [[17, "model-optimization"]], "Model zoo": [[4, "model-zoo"]], "Modifying the documentation": [[2, "modifying-the-documentation"]], "Naming conventions": [[14, "naming-conventions"]], "OCR": [[16, "ocr"]], "Object Detection": [[16, "object-detection"]], "Our Pledge": [[1, "our-pledge"]], "Our Standards": [[1, "our-standards"]], "Page": [[7, "page"]], "Preparing your model for inference": [[17, null]], "Prerequisites": [[3, "prerequisites"]], "Pretrained community models": [[14, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[14, "pushing-to-the-huggingface-hub"]], "Questions": [[2, "questions"]], "Recognition": [[14, "recognition"], [16, "recognition"]], "Recognition predictors": [[18, "recognition-predictors"]], "Returns:": [[6, "returns"], [7, "returns"], [7, "id11"], [7, "id13"], [7, "id15"], [7, "id19"], [7, "id23"], [7, "id27"], [7, "id31"], [8, "returns"], [8, "id6"], [8, "id11"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id29"], [8, "id34"], [8, "id39"], [8, "id44"], [8, "id49"], [8, "id53"], [8, "id57"], [8, "id62"], [8, "id66"], [8, "id71"], [8, "id76"], [8, "id80"], [8, "id84"], [8, "id88"], [8, "id93"], [8, "id98"], [8, "id102"], [8, "id107"], [8, "id112"], [8, "id117"], [8, "id122"], [8, "id126"], [8, "id130"], [8, "id135"], [8, "id140"], [8, "id145"], [8, "id149"], [8, "id153"], [8, "id158"], [8, "id162"], [8, "id166"], [8, "id168"], [8, "id170"], [8, "id172"], [10, "returns"]], "Scope": [[1, "scope"]], "Share your model with the community": [[14, null]], "Supported Vocabs": [[6, "supported-vocabs"]], "Supported contribution modules": [[5, "supported-contribution-modules"]], "Supported datasets": [[4, "supported-datasets"]], "Supported transformations": [[9, "supported-transformations"]], "Synthetic dataset generator": [[6, "synthetic-dataset-generator"], [16, "synthetic-dataset-generator"]], "Task evaluation": [[10, "task-evaluation"]], "Text Detection": [[18, "text-detection"]], "Text Recognition": [[18, "text-recognition"]], "Text detection models": [[4, "text-detection-models"]], "Text recognition models": [[4, "text-recognition-models"]], "Train your own model": [[12, null]], "Two-stage approaches": [[18, "two-stage-approaches"]], "Unit tests": [[2, "unit-tests"]], "Use your own datasets": [[16, "use-your-own-datasets"]], "Using your ONNX exported model": [[17, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[3, "via-conda-only-for-linux"]], "Via Git": [[3, "via-git"]], "Via Python Package": [[3, "via-python-package"]], "Visualization": [[10, "visualization"]], "What should I do with the output?": [[18, "what-should-i-do-with-the-output"]], "Word": [[7, "word"]], "docTR Notebooks": [[11, null]], "docTR Vocabs": [[6, "id62"]], "docTR: Document Text Recognition": [[4, null]], "doctr.contrib": [[5, null]], "doctr.datasets": [[6, null], [6, "datasets"]], "doctr.io": [[7, null]], "doctr.models": [[8, null]], "doctr.models.classification": [[8, "doctr-models-classification"]], "doctr.models.detection": [[8, "doctr-models-detection"]], "doctr.models.factory": [[8, "doctr-models-factory"]], "doctr.models.recognition": [[8, "doctr-models-recognition"]], "doctr.models.zoo": [[8, "doctr-models-zoo"]], "doctr.transforms": [[9, null]], "doctr.utils": [[10, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[7, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[7, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[9, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[6, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[9, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[9, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[6, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[6, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[8, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[6, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[6, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[7, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[7, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[6, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[6, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[9, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[9, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[6, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[6, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[6, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[6, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[6, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[8, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[9, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[7, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[6, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[9, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[8, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[6, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[9, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[7, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[9, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[9, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[9, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[9, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[9, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[9, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[9, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[9, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[9, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[9, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[9, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[9, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[7, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[7, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[7, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[6, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[9, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[7, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[7, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[6, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[6, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[6, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[6, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[9, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[10, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[6, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[7, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[6, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[6, 0, 1, "", "CORD"], [6, 0, 1, "", "CharacterGenerator"], [6, 0, 1, "", "DetectionDataset"], [6, 0, 1, "", "DocArtefacts"], [6, 0, 1, "", "FUNSD"], [6, 0, 1, "", "IC03"], [6, 0, 1, "", "IC13"], [6, 0, 1, "", "IIIT5K"], [6, 0, 1, "", "IIITHWS"], [6, 0, 1, "", "IMGUR5K"], [6, 0, 1, "", "MJSynth"], [6, 0, 1, "", "OCRDataset"], [6, 0, 1, "", "RecognitionDataset"], [6, 0, 1, "", "SROIE"], [6, 0, 1, "", "SVHN"], [6, 0, 1, "", "SVT"], [6, 0, 1, "", "SynthText"], [6, 0, 1, "", "WILDRECEIPT"], [6, 0, 1, "", "WordGenerator"], [6, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[6, 0, 1, "", "DataLoader"]], "doctr.io": [[7, 0, 1, "", "Artefact"], [7, 0, 1, "", "Block"], [7, 0, 1, "", "Document"], [7, 0, 1, "", "DocumentFile"], [7, 0, 1, "", "Line"], [7, 0, 1, "", "Page"], [7, 0, 1, "", "Word"], [7, 1, 1, "", "decode_img_as_tensor"], [7, 1, 1, "", "read_html"], [7, 1, 1, "", "read_img_as_numpy"], [7, 1, 1, "", "read_img_as_tensor"], [7, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[7, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[7, 2, 1, "", "from_images"], [7, 2, 1, "", "from_pdf"], [7, 2, 1, "", "from_url"]], "doctr.io.Page": [[7, 2, 1, "", "show"]], "doctr.models": [[8, 1, 1, "", "kie_predictor"], [8, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[8, 1, 1, "", "crop_orientation_predictor"], [8, 1, 1, "", "magc_resnet31"], [8, 1, 1, "", "mobilenet_v3_large"], [8, 1, 1, "", "mobilenet_v3_large_r"], [8, 1, 1, "", "mobilenet_v3_small"], [8, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [8, 1, 1, "", "mobilenet_v3_small_page_orientation"], [8, 1, 1, "", "mobilenet_v3_small_r"], [8, 1, 1, "", "page_orientation_predictor"], [8, 1, 1, "", "resnet18"], [8, 1, 1, "", "resnet31"], [8, 1, 1, "", "resnet34"], [8, 1, 1, "", "resnet50"], [8, 1, 1, "", "textnet_base"], [8, 1, 1, "", "textnet_small"], [8, 1, 1, "", "textnet_tiny"], [8, 1, 1, "", "vgg16_bn_r"], [8, 1, 1, "", "vit_b"], [8, 1, 1, "", "vit_s"]], "doctr.models.detection": [[8, 1, 1, "", "db_mobilenet_v3_large"], [8, 1, 1, "", "db_resnet50"], [8, 1, 1, "", "detection_predictor"], [8, 1, 1, "", "fast_base"], [8, 1, 1, "", "fast_small"], [8, 1, 1, "", "fast_tiny"], [8, 1, 1, "", "linknet_resnet18"], [8, 1, 1, "", "linknet_resnet34"], [8, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[8, 1, 1, "", "from_hub"], [8, 1, 1, "", "login_to_hub"], [8, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[8, 1, 1, "", "crnn_mobilenet_v3_large"], [8, 1, 1, "", "crnn_mobilenet_v3_small"], [8, 1, 1, "", "crnn_vgg16_bn"], [8, 1, 1, "", "master"], [8, 1, 1, "", "parseq"], [8, 1, 1, "", "recognition_predictor"], [8, 1, 1, "", "sar_resnet31"], [8, 1, 1, "", "vitstr_base"], [8, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[9, 0, 1, "", "ChannelShuffle"], [9, 0, 1, "", "ColorInversion"], [9, 0, 1, "", "Compose"], [9, 0, 1, "", "GaussianBlur"], [9, 0, 1, "", "GaussianNoise"], [9, 0, 1, "", "LambdaTransformation"], [9, 0, 1, "", "Normalize"], [9, 0, 1, "", "OneOf"], [9, 0, 1, "", "RandomApply"], [9, 0, 1, "", "RandomBrightness"], [9, 0, 1, "", "RandomContrast"], [9, 0, 1, "", "RandomCrop"], [9, 0, 1, "", "RandomGamma"], [9, 0, 1, "", "RandomHorizontalFlip"], [9, 0, 1, "", "RandomHue"], [9, 0, 1, "", "RandomJpegQuality"], [9, 0, 1, "", "RandomResize"], [9, 0, 1, "", "RandomRotate"], [9, 0, 1, "", "RandomSaturation"], [9, 0, 1, "", "RandomShadow"], [9, 0, 1, "", "Resize"], [9, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[10, 0, 1, "", "DetectionMetric"], [10, 0, 1, "", "LocalizationConfusion"], [10, 0, 1, "", "OCRMetric"], [10, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.visualization": [[10, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [1, 7, 8, 10, 14, 17], "0": [1, 3, 6, 9, 10, 12, 15, 16, 18], "00": 18, "01": 18, "0123456789": 6, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "02562": 8, "03": 18, "035": 18, "0361328125": 18, "04": 18, "05": 18, "06": 18, "06640625": 18, "07": 18, "08": [9, 18], "09": 18, "0966796875": 18, "1": [6, 7, 8, 9, 10, 12, 16, 18], "10": [6, 10, 18], "100": [6, 9, 10, 16, 18], "1000": 18, "101": 6, "1024": [8, 12, 18], "104": 6, "106": 6, "108": 6, "1095": 16, "11": 18, "110": 10, "1107": 16, "114": 6, "115": 6, "1156": 16, "116": 6, "118": 6, "11800h": 18, "11th": 18, "12": 18, "120": 6, "123": 6, "126": 6, "1268": 16, "128": [8, 12, 17, 18], "13": 18, "130": 6, "13068": 16, "131": 6, "1337891": 16, "1357421875": 18, "1396484375": 18, "14": 18, "1420": 18, "14470v1": 6, "149": 16, "15": 18, "150": [10, 18], "1552": 18, "16": [8, 17, 18], "1630859375": 18, "1684": 18, "16x16": 8, "17": 18, "1778": 18, "1782": 18, "18": [8, 18], "185546875": 18, "1900": 18, "1910": 8, "19342": 16, "19370": 16, "195": 6, "19598": 16, "199": 18, "1999": 18, "2": [3, 4, 6, 7, 9, 15, 18], "20": 18, "200": 10, "2000": 16, "2003": [4, 6], "2012": 6, "2013": [4, 6], "2015": 6, "2019": 4, "207901": 16, "21": 18, "2103": 6, "2186": 16, "21888": 16, "22": 18, "224": [8, 9], "225": 9, "22672": 16, "229": [9, 16], "23": 18, "233": 16, "236": 6, "24": 18, "246": 16, "249": 16, "25": 18, "2504": 18, "255": [7, 8, 9, 10, 18], "256": 8, "257": 16, "26": 18, "26032": 16, "264": 12, "27": 18, "2700": 16, "2710": 18, "2749": 12, "28": 18, "287": 12, "29": 18, "296": 12, "299": 12, "2d": 18, "3": [3, 4, 7, 8, 9, 10, 17, 18], "30": 18, "300": 16, "3000": 16, "301": 12, "30595": 18, "30ghz": 18, "31": 8, "32": [6, 8, 9, 12, 16, 17, 18], "3232421875": 18, "33": [9, 18], "33402": 16, "33608": 16, "34": [8, 18], "340": 18, "3456": 18, "3515625": 18, "36": 18, "360": 16, "37": [6, 18], "38": 18, "39": 18, "4": [8, 9, 10, 18], "40": 18, "406": 9, "41": 18, "42": 18, "43": 18, "44": 18, "45": 18, "456": 9, "46": 18, "47": 18, "472": 16, "48": [6, 18], "485": 9, "49": 18, "49377": 16, "5": [6, 9, 10, 15, 18], "50": [8, 16, 18], "51": 18, "51171875": 18, "512": 8, "52": [6, 18], "529": 18, "53": 18, "54": 18, "540": 18, "5478515625": 18, "55": 18, "56": 18, "57": 18, "58": [6, 18], "580": 18, "5810546875": 18, "583": 18, "59": 18, "597": 18, "5k": [4, 6], "5m": 18, "6": [9, 18], "60": 9, "600": [8, 10, 18], "61": 18, "62": 18, "626": 16, "63": 18, "64": [8, 9, 18], "641": 18, "647": 16, "65": 18, "66": 18, "67": 18, "68": 18, "69": 18, "693": 12, "694": 12, "695": 12, "6m": 18, "7": 18, "70": [6, 10, 18], "707470": 16, "71": [6, 18], "7100000": 16, "7141797": 16, "7149": 16, "72": 18, "72dpi": 7, "73": 18, "73257": 16, "74": 18, "75": [9, 18], "7581382": 16, "76": 18, "77": 18, "772": 12, "772875": 16, "78": 18, "785": 12, "79": 18, "793533": 16, "796": 16, "798": 12, "7m": 18, "8": [8, 9, 18], "80": 18, "800": [8, 10, 16, 18], "81": 18, "82": 18, "83": 18, "84": 18, "849": 16, "85": 18, "8564453125": 18, "857": 18, "85875": 16, "86": 18, "8603515625": 18, "87": 18, "8707": 16, "88": 18, "89": 18, "9": [3, 9, 18], "90": 18, "90k": 6, "90kdict32px": 6, "91": 18, "914085328578949": 18, "92": 18, "93": 18, "94": [6, 18], "95": [10, 18], "9578408598899841": 18, "96": 18, "97": 18, "98": 18, "99": 18, "9949972033500671": 18, "A": [1, 2, 4, 6, 7, 8, 11, 17], "As": 2, "Be": 18, "Being": 1, "By": 13, "For": [1, 2, 3, 12, 18], "If": [2, 7, 8, 12, 18], "In": [2, 6, 16], "It": [9, 14, 15, 17], "Its": [4, 8], "No": [1, 18], "Of": 6, "Or": [15, 17], "The": [1, 2, 6, 7, 10, 13, 15, 16, 17, 18], "Then": 8, "To": [2, 3, 13, 14, 15, 17, 18], "_": [1, 6, 8], "__call__": 18, "_build": 2, "_i": 10, "ab": 6, "abc": 17, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "abdef": [6, 16], "abl": [16, 18], "about": [1, 16, 18], "abov": 18, "abstractdataset": 6, "abus": 1, "accept": 1, "access": [4, 7, 16, 18], "account": [1, 14], "accur": 18, "accuraci": 10, "achiev": 17, "act": 1, "action": 1, "activ": 4, "ad": [2, 8, 9], "adapt": 1, "add": [9, 10, 14, 18], "add_hook": 18, "add_label": 10, "addit": [2, 3, 7, 15, 18], "addition": [2, 18], "address": [1, 7], "adjust": 9, "advanc": 1, "advantag": 17, "advis": 2, "aesthet": [4, 6], "affect": 1, "after": [14, 18], "ag": 1, "again": 8, "aggreg": [10, 16], "aggress": 1, "align": [1, 7, 9], "all": [1, 2, 5, 6, 7, 9, 10, 15, 16, 18], "allow": [1, 17], "along": 18, "alreadi": [2, 17], "also": [1, 8, 14, 15, 16, 18], "alwai": 16, "an": [1, 2, 4, 6, 7, 8, 10, 15, 17, 18], "analysi": [7, 15], "ancient_greek": 6, "angl": [7, 9], "ani": [1, 6, 7, 8, 9, 10, 17, 18], "annot": 6, "anot": 16, "anoth": [8, 12, 16], "answer": 1, "anyascii": 10, "anyon": 4, "anyth": 15, "api": [2, 4], "apolog": 1, "apologi": 1, "app": 2, "appear": 1, "appli": [1, 6, 9], "applic": [4, 8], "appoint": 1, "appreci": 14, "appropri": [1, 2, 18], "ar": [1, 2, 3, 5, 6, 7, 9, 10, 11, 15, 16, 18], "arab": 6, "arabic_diacrit": 6, "arabic_lett": 6, "arabic_punctu": 6, "arbitrarili": [4, 8], "arch": [8, 14], "architectur": [4, 8, 14, 15], "area": 18, "argument": [6, 7, 8, 10, 12, 18], "around": 1, "arrai": [7, 9, 10], "art": [4, 15], "artefact": [10, 15, 18], "artefact_typ": 7, "artifici": [4, 6], "arxiv": [6, 8], "asarrai": 10, "ascii_lett": 6, "aspect": [4, 8, 9, 18], "assess": 10, "assign": 10, "associ": 7, "assum": 8, "assume_straight_pag": [8, 12, 18], "astyp": [8, 10, 18], "attack": 1, "attend": [4, 8], "attent": [1, 8], "autom": 4, "automat": 18, "autoregress": [4, 8], "avail": [1, 4, 5, 9], "averag": [9, 18], "avoid": [1, 3], "aw": [4, 18], "awar": 18, "azur": 18, "b": [8, 10, 18], "b_j": 10, "back": 2, "backbon": 8, "backend": 18, "background": 16, "bangla": 6, "bar": 15, "bar_cod": 16, "base": [4, 8, 15], "baselin": [4, 8, 18], "batch": [6, 8, 9, 15, 16, 18], "batch_siz": [6, 12, 15, 16, 17], "bblanchon": 3, "bbox": 18, "becaus": 13, "been": [2, 10, 16, 18], "befor": [6, 8, 9, 18], "begin": 10, "behavior": [1, 18], "being": [10, 18], "belong": 18, "benchmark": 18, "best": 1, "better": [11, 18], "between": [9, 10, 18], "bgr": 7, "bilinear": 9, "bin_thresh": 18, "binar": [4, 8, 18], "binari": [7, 17, 18], "bit": 17, "block": [10, 18], "block_1_1": 18, "blur": 9, "bmvc": 6, "bn": 14, "bodi": [1, 18], "bool": [6, 7, 8, 9, 10], "boolean": [8, 18], "both": [4, 6, 9, 16, 18], "bottom": [8, 18], "bound": [6, 7, 8, 9, 10, 15, 16, 18], "box": [6, 7, 8, 9, 10, 15, 16, 18], "box_thresh": 18, "bright": 9, "browser": [2, 4], "build": [2, 3, 17], "built": 2, "byte": [7, 18], "c": [3, 7, 10], "c_j": 10, "cach": [2, 6, 13], "cache_sampl": 6, "call": 17, "callabl": [6, 9], "can": [2, 3, 12, 13, 14, 15, 16, 18], "capabl": [2, 11, 18], "case": [6, 10], "cf": 18, "cfg": 18, "challeng": 6, "challenge2_test_task12_imag": 6, "challenge2_test_task1_gt": 6, "challenge2_training_task12_imag": 6, "challenge2_training_task1_gt": 6, "chang": [13, 18], "channel": [1, 2, 7, 9], "channel_prior": 3, "channelshuffl": 9, "charact": [4, 6, 7, 10, 16, 18], "charactergener": [6, 16], "characterist": 1, "charg": 18, "charset": 18, "chart": 7, "check": [2, 14, 18], "checkpoint": 8, "chip": 3, "ci": 2, "clarifi": 1, "clariti": 1, "class": [1, 6, 7, 9, 10, 18], "class_nam": 12, "classif": [16, 18], "classmethod": 7, "clear": 2, "clone": 3, "close": 2, "co": 14, "code": [4, 7, 15], "codecov": 2, "colab": 11, "collate_fn": 6, "collect": [7, 15], "color": 9, "colorinvers": 9, "column": 7, "com": [1, 3, 7, 8, 14], "combin": 18, "command": [2, 15], "comment": 1, "commit": 1, "common": [1, 9, 10, 17], "commun": 1, "compar": 4, "comparison": [10, 18], "competit": 6, "compil": [11, 18], "complaint": 1, "complementari": 10, "complet": 2, "compon": 18, "compos": [6, 18], "comprehens": 18, "comput": [6, 10, 17, 18], "conf_threshold": 15, "confid": [7, 18], "config": [3, 8], "configur": 8, "confus": 10, "consecut": [9, 18], "consequ": 1, "consid": [1, 2, 6, 7, 10, 18], "consist": 18, "consolid": [4, 6], "constant": 9, "construct": 1, "contact": 1, "contain": [5, 6, 11, 16, 18], "content": [6, 7, 18], "context": 8, "contib": 3, "continu": 1, "contrast": 9, "contrast_factor": 9, "contrib": [3, 15], "contribut": 1, "contributor": 2, "convers": 7, "convert": [7, 9], "convolut": 8, "coordin": [7, 18], "cord": [4, 6, 16, 18], "core": [10, 18], "corner": 18, "correct": 9, "correspond": [3, 7, 9, 18], "could": [1, 15], "counterpart": 10, "cover": 2, "coverag": 2, "cpu": [4, 12, 17], "creat": 14, "crnn": [4, 8, 14], "crnn_mobilenet_v3_larg": [8, 14, 18], "crnn_mobilenet_v3_smal": [8, 17, 18], "crnn_vgg16_bn": [8, 12, 14, 18], "crop": [7, 8, 9, 12, 16, 18], "crop_orient": [7, 18], "crop_orientation_predictor": [8, 12], "crop_param": 12, "cuda": 17, "currenc": 6, "current": [2, 12, 18], "custom": [14, 15, 17, 18], "custom_crop_orientation_model": 12, "custom_page_orientation_model": 12, "customhook": 18, "cvit": 4, "czczup": 8, "czech": 6, "d": [6, 16], "danish": 6, "data": [4, 6, 7, 9, 10, 12, 14], "dataload": 16, "dataset": [8, 12, 18], "dataset_info": 6, "date": [12, 18], "db": 14, "db_mobilenet_v3_larg": [8, 14, 18], "db_resnet34": 18, "db_resnet50": [8, 12, 14, 18], "dbnet": [4, 8], "deal": [11, 18], "decis": 1, "decod": 7, "decode_img_as_tensor": 7, "dedic": 17, "deem": 1, "deep": [8, 18], "def": 18, "default": [3, 7, 12, 13, 18], "defer": 16, "defin": [10, 17], "degre": [7, 9, 18], "degress": 7, "delet": 2, "delimit": 18, "delta": 9, "demo": [2, 4], "demonstr": 1, "depend": [2, 3, 4, 18], "deploi": 2, "deploy": 4, "derogatori": 1, "describ": 8, "descript": 11, "design": 9, "desir": 7, "det_arch": [8, 12, 14, 17], "det_b": 18, "det_model": [12, 14, 17], "det_param": 12, "det_predictor": [12, 18], "detail": [12, 18], "detect": [6, 7, 10, 11, 12, 15], "detect_languag": 8, "detect_orient": [8, 12, 18], "detection_predictor": [8, 18], "detection_task": [6, 16], "detectiondataset": [6, 16], "detectionmetr": 10, "detectionpredictor": [8, 12], "detector": [4, 8, 15], "deterior": 8, "determin": 1, "dev": [2, 13], "develop": 3, "deviat": 9, "devic": 17, "dict": [7, 10, 18], "dictionari": [7, 10], "differ": 1, "differenti": [4, 8], "digit": [4, 6, 16], "dimens": [7, 10, 18], "dimension": 9, "direct": 6, "directli": [14, 18], "directori": [2, 13], "disabl": [1, 13, 18], "disable_crop_orient": 18, "disable_page_orient": 18, "disclaim": 18, "discuss": 2, "disparag": 1, "displai": [7, 10], "display_artefact": 10, "distribut": 9, "div": 18, "divers": 1, "divid": 7, "do": [2, 3, 8], "doc": [2, 7, 15, 17, 18], "docartefact": [6, 16], "docstr": 2, "doctr": [3, 12, 13, 14, 15, 16, 17, 18], "doctr_cache_dir": 13, "doctr_multiprocessing_dis": 13, "document": [6, 8, 10, 11, 12, 15, 16, 17, 18], "documentbuild": 18, "documentfil": [7, 12, 14, 15, 17], "doesn": 17, "don": [12, 18], "done": 9, "download": [6, 16], "downsiz": 8, "draw": 9, "drop": 6, "drop_last": 6, "dtype": [7, 8, 9, 10, 17], "dual": [4, 6], "dummi": 14, "dummy_img": 18, "dummy_input": 17, "dure": 1, "dutch": 6, "dynam": [6, 15], "dynamic_seq_length": 6, "e": [1, 2, 3, 7, 8], "each": [4, 6, 7, 8, 9, 10, 16, 18], "eas": 2, "easi": [4, 10, 14, 17], "easili": [7, 10, 12, 14, 16, 18], "econom": 1, "edit": 1, "educ": 1, "effect": 18, "effici": [2, 4, 6, 8], "either": [10, 18], "element": [6, 7, 8, 18], "els": [2, 15], "email": 1, "empathi": 1, "en": 18, "enabl": [6, 7], "enclos": 7, "encod": [4, 6, 7, 8, 18], "encode_sequ": 6, "encount": 2, "encrypt": 7, "end": [4, 6, 8, 10], "english": [6, 16], "enough": [2, 18], "ensur": 2, "entri": 6, "environ": [1, 13], "eo": 6, "equiv": 18, "estim": 8, "etc": [7, 15], "ethnic": 1, "evalu": [16, 18], "event": 1, "everyon": 1, "everyth": [2, 18], "exact": [10, 18], "exampl": [1, 2, 4, 6, 8, 14, 18], "exchang": 17, "execut": 18, "exist": 14, "expand": 9, "expect": [7, 9, 10], "experi": 1, "explan": [1, 18], "explicit": 1, "exploit": [4, 8], "export": [7, 8, 10, 11, 15, 18], "export_as_straight_box": [8, 18], "export_as_xml": 18, "export_model_to_onnx": 17, "express": [1, 9], "extens": 7, "extern": [1, 16], "extract": [4, 6], "extractor": 8, "f_": 10, "f_a": 10, "factor": 9, "fair": 1, "fairli": 1, "fals": [6, 7, 8, 9, 10, 12, 18], "faq": 1, "fascan": 14, "fast": [4, 6, 8], "fast_bas": [8, 18], "fast_smal": [8, 18], "fast_tini": [8, 18], "faster": [4, 8, 17], "fasterrcnn_mobilenet_v3_large_fpn": 8, "favorit": 18, "featur": [3, 8, 10, 11, 12, 15], "feedback": 1, "feel": [2, 14], "felix92": 14, "few": [17, 18], "figsiz": 10, "figur": [10, 15], "file": [2, 6], "final": 8, "find": [2, 16], "finnish": 6, "first": [2, 6], "firsthand": 6, "fit": [8, 18], "flag": 18, "flip": 9, "float": [7, 9, 10, 17], "float32": [7, 8, 9, 17], "fn": 9, "focu": 14, "focus": [1, 6], "folder": 6, "follow": [1, 2, 3, 6, 9, 10, 12, 13, 14, 15, 18], "font": 6, "font_famili": 6, "foral": 10, "forc": 2, "forg": 3, "form": [4, 6, 18], "format": [7, 10, 12, 16, 17, 18], "forpost": [4, 6], "forum": 2, "fp16": 17, "frac": 10, "framework": [3, 14, 16, 18], "free": [1, 2, 14], "french": [6, 12, 14, 18], "friendli": 4, "from": [1, 4, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18], "from_hub": [8, 14], "from_imag": [7, 14, 15, 17], "from_pdf": 7, "from_url": 7, "full": [6, 10, 18], "function": [6, 9, 10, 15], "funsd": [4, 6, 16, 18], "further": 16, "futur": 6, "g": [7, 8], "g_": 10, "g_x": 10, "gamma": 9, "gaussian": 9, "gaussianblur": 9, "gaussiannois": 9, "gen": 18, "gender": 1, "gener": [2, 4, 7, 8], "generic_cyrillic_lett": 6, "geometri": [4, 7, 18], "geq": 10, "german": [6, 12, 14], "get": [17, 18], "git": 14, "github": [2, 3, 8, 14], "give": [1, 15], "given": [6, 7, 9, 10, 18], "global": 8, "go": 18, "good": 17, "googl": 2, "googlevis": 4, "gpu": [4, 15, 17], "gracefulli": 1, "graph": [4, 6, 7], "grayscal": 9, "ground": 10, "groung": 10, "group": [4, 18], "gt": 10, "gt_box": 10, "gt_label": 10, "guid": 2, "guidanc": 16, "gvision": 18, "h": [7, 8, 9], "h_": 10, "ha": [2, 6, 10, 16], "handl": [11, 16, 18], "handwrit": 6, "handwritten": 16, "harass": 1, "hardwar": 18, "harm": 1, "hat": 10, "have": [1, 2, 10, 12, 14, 16, 17, 18], "head": [8, 18], "healthi": 1, "hebrew": 6, "height": [7, 9], "hello": [10, 18], "help": 17, "here": [5, 9, 11, 15, 16, 18], "hf": 8, "hf_hub_download": 8, "high": 7, "higher": [3, 6, 18], "hindi": 6, "hindi_digit": 6, "hocr": 18, "hook": 18, "horizont": [7, 9, 18], "hous": 6, "how": [2, 11, 12, 14, 16], "howev": 16, "hsv": 9, "html": [1, 2, 3, 7, 18], "http": [1, 3, 6, 7, 8, 14, 18], "hub": 8, "hue": 9, "huggingfac": 8, "hw": 6, "i": [1, 2, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17], "i7": 18, "ic03": [4, 6, 16], "ic13": [4, 6, 16], "icdar": [4, 6], "icdar2019": 6, "id": 18, "ident": 1, "identifi": 4, "iiit": [4, 6], "iiit5k": [6, 16], "iiithw": [4, 6, 16], "imag": [4, 6, 7, 8, 9, 10, 14, 15, 16, 18], "imagenet": 8, "imageri": 1, "images_90k_norm": 6, "img": [6, 9, 16, 17], "img_cont": 7, "img_fold": [6, 16], "img_path": 7, "img_transform": 6, "imgur5k": [4, 6, 16], "imgur5k_annot": 6, "imlist": 6, "impact": 1, "implement": [6, 7, 8, 9, 10, 18], "import": [6, 7, 8, 9, 10, 12, 14, 15, 16, 17, 18], "improv": 8, "inappropri": 1, "incid": 1, "includ": [1, 6, 16, 17], "inclus": 1, "increas": 9, "independ": 9, "index": [2, 7], "indic": 10, "individu": 1, "infer": [4, 8, 9, 15, 18], "inform": [1, 2, 4, 6, 16], "input": [2, 7, 8, 9, 17, 18], "input_crop": 8, "input_pag": [8, 10, 18], "input_shap": 17, "input_tensor": 8, "inspir": [1, 9], "instal": [14, 15, 17], "instanc": [1, 18], "instanti": [8, 18], "instead": [6, 7, 8], "insult": 1, "int": [6, 7, 9], "int64": 10, "integ": 10, "integr": [4, 14, 16], "intel": 18, "interact": [1, 7, 10], "interfac": [14, 17], "interoper": 17, "interpol": 9, "interpret": [6, 7], "intersect": 10, "invert": 9, "investig": 1, "invis": 1, "involv": [1, 18], "io": [12, 14, 15, 17], "iou": 10, "iou_thresh": 10, "iou_threshold": 15, "irregular": [4, 8, 16], "isn": 6, "issu": [1, 2, 14], "italian": 6, "iter": [6, 9, 16, 18], "its": [7, 8, 9, 10, 16, 18], "itself": [8, 14], "j": 10, "job": 2, "join": 2, "jpeg": 9, "jpegqual": 9, "jpg": [6, 7, 14, 17], "json": [6, 16, 18], "json_output": 18, "jump": 2, "just": 1, "kei": [4, 6], "kera": [8, 17], "kernel": [4, 8, 9], "kernel_shap": 9, "keywoard": 8, "keyword": [6, 7, 8, 10], "kie": [8, 12], "kie_predictor": [8, 12], "kiepredictor": 8, "kind": 1, "know": [2, 17], "kwarg": [6, 7, 8, 10], "l": 10, "l_j": 10, "label": [6, 10, 15, 16], "label_fil": [6, 16], "label_fold": 6, "label_path": [6, 16], "labels_path": [6, 16], "ladder": 1, "lambda": 9, "lambdatransform": 9, "lang": 18, "languag": [1, 4, 6, 7, 8, 14, 18], "larg": [8, 14], "largest": 10, "last": [3, 6], "latenc": 8, "later": 2, "latest": 18, "latin": 6, "layer": 17, "layout": 18, "lead": 1, "leader": 1, "learn": [1, 4, 8, 17, 18], "least": 3, "left": [10, 18], "legacy_french": 6, "length": [6, 18], "less": [17, 18], "level": [1, 6, 10, 18], "leverag": 11, "lf": 14, "librari": [2, 3, 11, 12], "light": 4, "lightweight": 17, "like": 1, "limits_": 10, "line": [4, 8, 10, 18], "line_1_1": 18, "link": 12, "linknet": [4, 8], "linknet_resnet18": [8, 12, 17, 18], "linknet_resnet34": [8, 17, 18], "linknet_resnet50": [8, 18], "list": [6, 7, 9, 10, 14], "ll": 10, "load": [4, 6, 8, 15, 17], "load_state_dict": 12, "load_weight": 12, "loc_pr": 18, "local": [2, 4, 6, 8, 10, 16, 18], "localis": 6, "localizationconfus": 10, "locat": [2, 7, 18], "login": 8, "login_to_hub": [8, 14], "logo": [7, 15, 16], "love": 14, "lower": [9, 10, 18], "m": [2, 10, 18], "m1": 3, "macbook": 3, "machin": 17, "made": 4, "magc_resnet31": 8, "mai": [1, 2], "mail": 1, "main": 11, "maintain": 4, "mainten": 2, "make": [1, 2, 10, 12, 13, 14, 17, 18], "mani": [16, 18], "manipul": 18, "map": [6, 8], "map_loc": 12, "master": [4, 8, 18], "match": [10, 18], "mathcal": 10, "matplotlib": [7, 10], "max": [6, 9, 10], "max_angl": 9, "max_area": 9, "max_char": [6, 16], "max_delta": 9, "max_gain": 9, "max_gamma": 9, "max_qual": 9, "max_ratio": 9, "maximum": [6, 9], "maxval": [8, 9], "mbox": 10, "mean": [9, 10, 12], "meaniou": 10, "meant": [7, 17], "measur": 18, "media": 1, "median": 8, "meet": 12, "member": 1, "memori": [13, 17], "mention": 18, "merg": 6, "messag": 2, "meta": 18, "metadata": 17, "metal": 3, "method": [7, 9, 18], "metric": [10, 18], "middl": 18, "might": [17, 18], "min": 9, "min_area": 9, "min_char": [6, 16], "min_gain": 9, "min_gamma": 9, "min_qual": 9, "min_ratio": 9, "min_val": 9, "minde": [1, 3, 4, 8], "minim": [2, 4], "minimalist": [4, 8], "minimum": [3, 6, 9, 10, 18], "minval": 9, "miss": 3, "mistak": 1, "mixed_float16": 17, "mixed_precis": 17, "mjsynth": [4, 6, 16], "mnt": 6, "mobilenet": [8, 14], "mobilenet_v3_larg": 8, "mobilenet_v3_large_r": 8, "mobilenet_v3_smal": [8, 12], "mobilenet_v3_small_crop_orient": [8, 12], "mobilenet_v3_small_page_orient": [8, 12], "mobilenet_v3_small_r": 8, "mobilenetv3": 8, "modal": [4, 6], "mode": 3, "model": [6, 10, 13, 15, 16], "model_nam": [8, 14, 17], "model_path": [15, 17], "moder": 1, "modif": 2, "modifi": [8, 13, 18], "modul": [3, 7, 8, 9, 10, 18], "more": [2, 16, 18], "most": 18, "mozilla": 1, "multi": [4, 8], "multilingu": [6, 14], "multipl": [6, 7, 9, 18], "multipli": 9, "multiprocess": 13, "my": 8, "my_awesome_model": 14, "my_hook": 18, "n": [6, 10], "name": [6, 8, 17, 18], "nation": 1, "natur": [1, 4, 6], "ndarrai": [6, 7, 9, 10], "necessari": [3, 12, 13], "need": [2, 3, 6, 10, 12, 13, 14, 15, 18], "neg": 9, "nest": 18, "network": [4, 6, 8, 17], "neural": [4, 6, 8, 17], "new": [2, 10], "next": [6, 16], "nois": 9, "noisi": [4, 6], "non": [4, 6, 7, 8, 9, 10], "none": [6, 7, 8, 9, 10, 18], "normal": [8, 9], "norwegian": 6, "note": [0, 2, 6, 8, 12, 14, 15, 17], "now": 2, "np": [8, 9, 10, 18], "num_output_channel": 9, "num_sampl": [6, 16], "number": [6, 9, 10, 18], "numpi": [7, 8, 10, 18], "o": 3, "obb": 15, "obj_detect": 14, "object": [6, 7, 10, 15, 18], "objectness_scor": [7, 18], "oblig": 1, "obtain": 18, "occupi": 17, "ocr": [4, 6, 8, 10, 14], "ocr_carea": 18, "ocr_db_crnn": 10, "ocr_lin": 18, "ocr_pag": 18, "ocr_par": 18, "ocr_predictor": [8, 12, 14, 17, 18], "ocrdataset": [6, 16], "ocrmetr": 10, "ocrpredictor": [8, 12], "ocrx_word": 18, "offens": 1, "offici": [1, 8], "offlin": 1, "offset": 9, "onc": 18, "one": [2, 6, 8, 9, 12, 14, 18], "oneof": 9, "ones": [6, 10], "onli": [2, 8, 9, 10, 12, 14, 16, 17, 18], "onlin": 1, "onnx": 15, "onnxruntim": [15, 17], "onnxtr": 17, "opac": 9, "opacity_rang": 9, "open": [1, 2, 14, 17], "opinion": 1, "optic": [4, 18], "optim": [4, 18], "option": [6, 8, 12], "order": [2, 6, 7, 9], "org": [1, 6, 8, 18], "organ": 7, "orient": [1, 7, 8, 11, 15, 18], "orientationpredictor": 8, "other": [1, 2], "otherwis": [1, 7, 10], "our": [2, 8, 18], "out": [2, 8, 9, 10, 18], "outpout": 18, "output": [7, 9, 17], "output_s": [7, 9], "outsid": 13, "over": [6, 10, 18], "overal": [1, 8], "overlai": 7, "overview": 15, "overwrit": 12, "overwritten": 14, "own": 4, "p": [9, 18], "packag": [2, 4, 10, 13, 15, 16, 17], "pad": [6, 8, 9, 18], "page": [3, 6, 8, 10, 12, 18], "page1": 7, "page2": 7, "page_1": 18, "page_idx": [7, 18], "page_orientation_predictor": [8, 12], "page_param": 12, "pair": 10, "paper": 8, "par_1_1": 18, "paragraph": 18, "paragraph_break": 18, "param": [9, 18], "paramet": [4, 7, 8, 17], "pars": [4, 6], "parseq": [4, 8, 14, 17, 18], "part": [6, 9, 18], "parti": 3, "partial": 18, "particip": 1, "pass": [6, 7, 8, 12, 18], "password": 7, "patch": [8, 10], "path": [6, 7, 15, 16, 17], "path_to_checkpoint": 12, "path_to_custom_model": 17, "path_to_pt": 12, "pattern": 1, "pdf": [7, 8, 11], "pdfpage": 7, "peopl": 1, "per": [9, 18], "perform": [4, 7, 8, 9, 10, 13, 17, 18], "period": 1, "permiss": 1, "permut": [4, 8], "persian_lett": 6, "person": [1, 16], "phase": 18, "photo": 16, "physic": [1, 7], "pick": 9, "pictur": 7, "pip": [2, 3, 15, 17], "pipelin": 18, "pixel": [7, 9, 18], "pleas": 2, "plot": 10, "plt": 10, "plug": 14, "plugin": 3, "png": 7, "point": 17, "polici": 13, "polish": 6, "polit": 1, "polygon": [6, 10, 18], "pool": 8, "portugues": 6, "posit": [1, 10], "possibl": [2, 10, 14, 18], "post": [1, 18], "postprocessor": 18, "potenti": 8, "power": 4, "ppageno": 18, "pre": [2, 8, 17], "precis": [10, 18], "pred": 10, "pred_box": 10, "pred_label": 10, "predefin": 16, "predict": [7, 8, 10, 18], "predictor": [4, 7, 8, 11, 12, 14, 17], "prefer": 16, "preinstal": 3, "preprocessor": [12, 18], "prerequisit": 14, "present": 11, "preserv": [8, 9, 18], "preserve_aspect_ratio": [7, 8, 9, 12, 18], "pretrain": [4, 8, 10, 12, 17, 18], "pretrained_backbon": [8, 12], "print": 18, "prior": 6, "privaci": 1, "privat": 1, "probabl": 9, "problem": 2, "procedur": 9, "process": [2, 4, 7, 12, 18], "processor": 18, "produc": [11, 18], "product": 17, "profession": 1, "project": [2, 16], "promptli": 1, "proper": 2, "properli": 6, "provid": [1, 2, 4, 14, 15, 16, 18], "public": [1, 4], "publicli": 18, "publish": 1, "pull": 14, "punctuat": 6, "pure": 6, "purpos": 2, "push_to_hf_hub": [8, 14], "py": 14, "pypdfium2": [3, 7], "pyplot": [7, 10], "python": [2, 15], "python3": 14, "pytorch": [3, 4, 8, 9, 12, 14, 17, 18], "q": 2, "qr": [7, 15], "qr_code": 16, "qualiti": 9, "question": 1, "quickli": 4, "quicktour": 11, "r": 18, "race": 1, "ramdisk": 6, "rand": [8, 9, 10, 17, 18], "random": [8, 9, 10, 18], "randomappli": 9, "randombright": 9, "randomcontrast": 9, "randomcrop": 9, "randomgamma": 9, "randomhorizontalflip": 9, "randomhu": 9, "randomjpegqu": 9, "randomli": 9, "randomres": 9, "randomrot": 9, "randomsatur": 9, "randomshadow": 9, "rang": 9, "rassi": 14, "ratio": [8, 9, 18], "raw": [7, 10], "re": 17, "read": [4, 6, 8], "read_html": 7, "read_img_as_numpi": 7, "read_img_as_tensor": 7, "read_pdf": 7, "readi": 17, "real": [4, 8, 9], "reason": [1, 4, 6], "rebuild": 2, "rebuilt": 2, "recal": [10, 18], "receipt": [4, 6, 18], "reco_arch": [8, 12, 14, 17], "reco_b": 18, "reco_model": [12, 14, 17], "reco_param": 12, "reco_predictor": 12, "recogn": 18, "recognit": [6, 10, 11, 12], "recognition_predictor": [8, 18], "recognition_task": [6, 16], "recognitiondataset": [6, 16], "recognitionpredictor": [8, 12], "rectangular": 8, "reduc": [3, 9], "refer": [2, 3, 12, 14, 15, 16, 18], "regardless": 1, "region": 18, "regroup": 10, "regular": 16, "reject": 1, "rel": [7, 9, 10, 18], "relat": 7, "releas": [0, 3], "relev": 15, "religion": 1, "remov": 1, "render": [7, 18], "repo": 8, "repo_id": [8, 14], "report": 1, "repositori": [6, 8, 14], "repres": [1, 17, 18], "represent": [4, 8], "request": [1, 14], "requir": [3, 9, 17], "research": 4, "residu": 8, "resiz": [9, 18], "resnet": 8, "resnet18": [8, 14], "resnet31": 8, "resnet34": 8, "resnet50": [8, 14], "resolv": 7, "resolve_block": 18, "resolve_lin": 18, "resourc": 16, "respect": 1, "rest": [2, 9, 10], "restrict": 13, "result": [2, 6, 7, 11, 14, 17, 18], "return": 18, "reusabl": 18, "review": 1, "rgb": [7, 9], "rgb_mode": 7, "rgb_output": 7, "right": [1, 8, 10], "robust": [4, 6], "root": 6, "rotat": [6, 7, 8, 9, 10, 11, 12, 16, 18], "run": [2, 3, 8], "same": [2, 7, 10, 16, 17, 18], "sampl": [6, 16, 18], "sample_transform": 6, "sar": [4, 8], "sar_resnet31": [8, 18], "satur": 9, "save": [8, 16], "scale": [7, 8, 9, 10], "scale_rang": 9, "scan": [4, 6], "scene": [4, 6, 8], "score": [7, 10], "script": [2, 16], "seamless": 4, "seamlessli": [4, 18], "search": 8, "searchabl": 11, "sec": 18, "second": 18, "section": [12, 14, 15, 17, 18], "secur": [1, 13], "see": [1, 2], "seen": 18, "segment": [4, 8, 18], "self": 18, "semant": [4, 8], "send": 18, "sens": 10, "sensit": 16, "separ": 18, "sequenc": [4, 6, 7, 8, 10, 18], "sequenti": [9, 18], "seri": 1, "seriou": 1, "set": [1, 3, 6, 8, 10, 13, 15, 18], "set_global_polici": 17, "sever": [7, 9, 18], "sex": 1, "sexual": 1, "shade": 9, "shape": [4, 7, 8, 9, 10, 18], "share": [13, 16], "shift": 9, "shm": 13, "should": [2, 6, 7, 9, 10], "show": [4, 7, 8, 10, 12, 14, 15], "showcas": [2, 11], "shuffl": [6, 9], "side": 10, "signatur": 7, "signific": 16, "simpl": [4, 8, 17], "simpler": 8, "sinc": [6, 16], "singl": [1, 2, 4, 6], "single_img_doc": 17, "size": [1, 6, 7, 9, 15, 18], "skew": 18, "slack": 2, "slightli": 8, "small": [2, 8, 18], "smallest": 7, "snapshot_download": 8, "snippet": 18, "so": [2, 3, 6, 8, 14, 16], "social": 1, "socio": 1, "some": [3, 11, 14, 16], "someth": 2, "somewher": 2, "sort": 1, "sourc": [6, 7, 8, 9, 10, 14], "space": [1, 18], "span": 18, "spanish": 6, "spatial": [4, 6, 7], "specif": [2, 3, 10, 12, 16, 18], "specifi": [1, 6, 7], "speed": [4, 8, 18], "sphinx": 2, "sroie": [4, 6, 16], "stabl": 3, "stackoverflow": 2, "stage": 4, "standalon": 11, "standard": 9, "start": 6, "state": [4, 10, 15], "static": 10, "statu": 1, "std": [9, 12], "step": 13, "still": 18, "str": [6, 7, 8, 9, 10], "straight": [6, 8, 16, 18], "straighten": 18, "straighten_pag": [8, 12, 18], "straigten_pag": 12, "stream": 7, "street": [4, 6], "strict": 3, "strictli": 10, "string": [6, 7, 10, 18], "strive": 3, "strong": [4, 8], "structur": [17, 18], "subset": [6, 18], "suggest": [2, 14], "sum": 10, "summari": 10, "support": [3, 12, 15, 17, 18], "sustain": 1, "svhn": [4, 6, 16], "svt": [6, 16], "swedish": 6, "symmetr": [8, 9, 18], "symmetric_pad": [8, 9, 18], "synthet": 4, "synthtext": [4, 6, 16], "system": 18, "t": [2, 6, 12, 17, 18], "tabl": [14, 15, 16], "take": [1, 6, 18], "target": [6, 7, 9, 10, 16], "target_s": 6, "task": [4, 6, 8, 14, 16, 18], "task2": 6, "team": 3, "techminde": 3, "templat": [2, 4], "tensor": [6, 7, 9, 18], "tensorflow": [3, 4, 7, 8, 9, 12, 14, 17, 18], "tensorspec": 17, "term": 1, "test": [6, 16], "test_set": 6, "text": [6, 7, 8, 10, 16], "text_output": 18, "textmatch": 10, "textnet": 8, "textnet_bas": 8, "textnet_smal": 8, "textnet_tini": 8, "textract": [4, 18], "textstylebrush": [4, 6], "textual": [4, 6, 7, 8, 18], "tf": [3, 7, 8, 9, 14, 17], "than": [2, 10, 14], "thank": 2, "thei": [1, 10], "them": [6, 18], "thi": [1, 2, 3, 5, 6, 9, 10, 12, 13, 14, 16, 17, 18], "thing": [17, 18], "third": 3, "those": [1, 7, 18], "threaten": 1, "threshold": 18, "through": [1, 9, 15, 16], "tilman": 14, "time": [1, 4, 8, 10, 16], "tini": 8, "titl": [7, 18], "tm": 18, "tmp": 13, "togeth": [2, 7], "tograi": 9, "tool": 16, "top": [10, 17, 18], "topic": 2, "torch": [3, 9, 12, 14, 17], "torchvis": 9, "total": 12, "toward": [1, 3], "train": [2, 6, 8, 9, 14, 15, 16, 17, 18], "train_it": [6, 16], "train_load": [6, 16], "train_pytorch": 14, "train_set": [6, 16], "train_tensorflow": 14, "trainabl": [4, 8], "tranform": 9, "transcrib": 18, "transfer": [4, 6], "transfo": 9, "transform": [4, 6, 8], "translat": 1, "troll": 1, "true": [6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18], "truth": 10, "tune": 17, "tupl": [6, 7, 9, 10], "two": [7, 13], "txt": 6, "type": [7, 10, 14, 17, 18], "typic": 18, "u": [1, 2], "ucsd": 6, "udac": 2, "uint8": [7, 8, 10, 18], "ukrainian": 6, "unaccept": 1, "underli": [16, 18], "underneath": 7, "understand": [4, 6, 18], "uniform": [8, 9], "uniformli": 9, "uninterrupt": [7, 18], "union": 10, "unittest": 2, "unlock": 7, "unoffici": 8, "unprofession": 1, "unsolicit": 1, "unsupervis": 4, "unwelcom": 1, "up": [8, 18], "updat": 10, "upgrad": 2, "upper": [6, 9], "uppercas": 16, "url": 7, "us": [1, 2, 3, 6, 8, 10, 11, 12, 13, 14, 15, 18], "usabl": 18, "usag": [13, 17], "use_polygon": [6, 10, 16], "useabl": 18, "user": [4, 7, 11], "utf": 18, "util": 17, "v1": 14, "v3": [8, 14, 18], "valid": 16, "valu": [2, 7, 9, 18], "valuabl": 4, "variabl": 13, "varieti": 6, "veri": 8, "version": [1, 2, 3, 17, 18], "vgg": 8, "vgg16": 14, "vgg16_bn_r": 8, "via": 1, "vietnames": 6, "view": [4, 6], "viewpoint": 1, "violat": 1, "visibl": 1, "vision": [4, 6, 8], "visiondataset": 6, "visiontransform": 8, "visual": [3, 4, 15], "visualize_pag": 10, "vit_": 8, "vit_b": 8, "vitstr": [4, 8, 17], "vitstr_bas": [8, 18], "vitstr_smal": [8, 12, 17, 18], "viz": 3, "vocab": [12, 14, 16, 17, 18], "vocabulari": [6, 12, 14], "w": [7, 8, 9, 10], "w3": 18, "wa": 1, "wai": [1, 4, 16], "want": [2, 17, 18], "warmup": 18, "wasn": 2, "we": [1, 2, 3, 4, 7, 9, 12, 14, 16, 17, 18], "weasyprint": 7, "web": [2, 7], "websit": 6, "welcom": 1, "well": [1, 17], "were": [1, 7, 18], "what": 1, "when": [1, 2, 8], "whenev": 2, "where": [2, 7, 9, 10], "whether": [2, 6, 7, 9, 10, 16, 18], "which": [1, 8, 13, 15, 16, 18], "whichev": 3, "while": [9, 18], "why": 1, "width": [7, 9], "wiki": 1, "wildreceipt": [4, 6, 16], "window": [8, 10], "wish": 2, "within": 1, "without": [1, 6, 8], "wonder": 2, "word": [4, 6, 8, 10, 18], "word_1_1": 18, "word_1_2": 18, "word_1_3": 18, "wordgener": [6, 16], "words_onli": 10, "work": [12, 13, 18], "workflow": 2, "worklow": 2, "world": [10, 18], "worth": 8, "wrap": 18, "wrapper": [6, 9], "write": 13, "written": [1, 7], "www": [1, 7, 18], "x": [7, 9, 10], "x_ascend": 18, "x_descend": 18, "x_i": 10, "x_size": 18, "x_wconf": 18, "xhtml": 18, "xmax": 7, "xmin": 7, "xml": 18, "xml_bytes_str": 18, "xml_element": 18, "xml_output": 18, "xmln": 18, "y": 10, "y_i": 10, "y_j": 10, "yet": 15, "ymax": 7, "ymin": 7, "yolov8": 15, "you": [2, 3, 6, 7, 8, 12, 13, 14, 15, 16, 17, 18], "your": [2, 4, 7, 10, 18], "yoursit": 7, "zero": [9, 10], "zoo": 12, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 6, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 6, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 6, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 6, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 6, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 6, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 6, "\u00e4\u00f6\u00e4\u00f6": 6, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 6, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 6, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 6, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 6, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 6, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 6, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0905": 6, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 6, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 6, "\u0950": 6, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 6, "\u09bd": 6, "\u09ce": 6, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 6}, "titles": ["Changelog", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 2, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 1], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 1], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 1], "31": 0, "4": [0, 1], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 18, "approach": 18, "architectur": 18, "arg": [6, 7, 8, 9, 10], "artefact": 7, "artefactdetect": 15, "attribut": 1, "avail": [15, 16, 18], "aw": 13, "ban": 1, "block": 7, "bug": 2, "changelog": 0, "choos": [16, 18], "classif": [8, 12, 14], "code": [1, 2], "codebas": 2, "commit": 2, "commun": 14, "compos": 9, "conda": 3, "conduct": 1, "connect": 2, "continu": 2, "contrib": 5, "contribut": [2, 5, 15], "contributor": 1, "convent": 14, "correct": 1, "coven": 1, "custom": [6, 12], "data": 16, "dataload": 6, "dataset": [4, 6, 16], "detect": [4, 8, 14, 16, 18], "develop": 2, "do": 18, "doctr": [2, 4, 5, 6, 7, 8, 9, 10, 11], "document": [2, 4, 7], "end": 18, "enforc": 1, "evalu": 10, "export": 17, "factori": 8, "featur": [2, 4], "feedback": 2, "file": 7, "from": 14, "gener": [6, 16], "git": 3, "guidelin": 1, "half": 17, "hub": 14, "huggingfac": 14, "i": 18, "infer": 17, "instal": [2, 3], "integr": [2, 15], "io": 7, "lambda": 13, "let": 2, "line": 7, "linux": 3, "load": [12, 14, 16], "loader": 6, "main": 4, "mode": 2, "model": [4, 8, 12, 14, 17, 18], "modifi": 2, "modul": [5, 15], "name": 14, "notebook": 11, "object": 16, "ocr": [16, 18], "onli": 3, "onnx": 17, "optim": 17, "option": 18, "orient": 12, "our": 1, "output": 18, "own": [12, 16], "packag": 3, "page": 7, "perman": 1, "pipelin": 15, "pledg": 1, "precis": 17, "predictor": 18, "prepar": 17, "prerequisit": 3, "pretrain": 14, "push": 14, "python": 3, "qualiti": 2, "question": 2, "read": 7, "readi": 16, "recognit": [4, 8, 14, 16, 18], "report": 2, "request": 2, "respons": 1, "return": [6, 7, 8, 10], "right": 18, "scope": 1, "share": 14, "should": 18, "stage": 18, "standard": 1, "structur": [2, 7], "style": 2, "support": [4, 5, 6, 9], "synthet": [6, 16], "task": 10, "temporari": 1, "test": 2, "text": [4, 18], "train": 12, "transform": 9, "two": 18, "unit": 2, "us": [16, 17], "util": 10, "v0": 0, "verif": 2, "via": 3, "visual": 10, "vocab": 6, "warn": 1, "what": 18, "word": 7, "your": [12, 14, 15, 16, 17], "zoo": [4, 8]}})
\ No newline at end of file
diff --git a/using_doctr/custom_models_training.html b/using_doctr/custom_models_training.html
index 580b4368b7..e664c6a950 100644
--- a/using_doctr/custom_models_training.html
+++ b/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -615,7 +615,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/using_doctr/running_on_aws.html b/using_doctr/running_on_aws.html
index ddb0c3c80f..81c38b49f5 100644
--- a/using_doctr/running_on_aws.html
+++ b/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -358,7 +358,7 @@ AWS Lambda
-
+
diff --git a/using_doctr/sharing_models.html b/using_doctr/sharing_models.html
index 07a3b2f2a3..4f5d1d68a5 100644
--- a/using_doctr/sharing_models.html
+++ b/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -540,7 +540,7 @@ Recognition
-
+
diff --git a/using_doctr/using_contrib_modules.html b/using_doctr/using_contrib_modules.html
index b4a10925e6..cf282ff3a4 100644
--- a/using_doctr/using_contrib_modules.html
+++ b/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -411,7 +411,7 @@ ArtefactDetection
-
+
diff --git a/using_doctr/using_datasets.html b/using_doctr/using_datasets.html
index 4a52df36ba..e30b6d6459 100644
--- a/using_doctr/using_datasets.html
+++ b/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -638,7 +638,7 @@ Data Loading
-
+
diff --git a/using_doctr/using_model_export.html b/using_doctr/using_model_export.html
index 2b30ee63a1..ad9d09ed4c 100644
--- a/using_doctr/using_model_export.html
+++ b/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -463,7 +463,7 @@ Using your ONNX exported model
-
+
diff --git a/using_doctr/using_models.html b/using_doctr/using_models.html
index 13cb06116b..5c80dbf62d 100644
--- a/using_doctr/using_models.html
+++ b/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1249,7 +1249,7 @@ Advanced options
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/cord.html b/v0.1.0/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.0/_modules/doctr/datasets/cord.html
+++ b/v0.1.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/detection.html b/v0.1.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.0/_modules/doctr/datasets/detection.html
+++ b/v0.1.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/doc_artefacts.html b/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/funsd.html b/v0.1.0/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.0/_modules/doctr/datasets/funsd.html
+++ b/v0.1.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ic03.html b/v0.1.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.0/_modules/doctr/datasets/ic03.html
+++ b/v0.1.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ic13.html b/v0.1.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.0/_modules/doctr/datasets/ic13.html
+++ b/v0.1.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/iiit5k.html b/v0.1.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/iiithws.html b/v0.1.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/imgur5k.html b/v0.1.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/loader.html b/v0.1.0/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.0/_modules/doctr/datasets/loader.html
+++ b/v0.1.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/mjsynth.html b/v0.1.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ocr.html b/v0.1.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.0/_modules/doctr/datasets/ocr.html
+++ b/v0.1.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/recognition.html b/v0.1.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.0/_modules/doctr/datasets/recognition.html
+++ b/v0.1.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/sroie.html b/v0.1.0/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.0/_modules/doctr/datasets/sroie.html
+++ b/v0.1.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/svhn.html b/v0.1.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.0/_modules/doctr/datasets/svhn.html
+++ b/v0.1.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/svt.html b/v0.1.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.0/_modules/doctr/datasets/svt.html
+++ b/v0.1.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/synthtext.html b/v0.1.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/utils.html b/v0.1.0/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.0/_modules/doctr/datasets/utils.html
+++ b/v0.1.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/wildreceipt.html b/v0.1.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.0/_modules/doctr/io/elements.html b/v0.1.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.0/_modules/doctr/io/elements.html
+++ b/v0.1.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.0/_modules/doctr/io/html.html b/v0.1.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.0/_modules/doctr/io/html.html
+++ b/v0.1.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.0/_modules/doctr/io/image/base.html b/v0.1.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.0/_modules/doctr/io/image/base.html
+++ b/v0.1.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.0/_modules/doctr/io/image/tensorflow.html b/v0.1.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/io/pdf.html b/v0.1.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.0/_modules/doctr/io/pdf.html
+++ b/v0.1.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.0/_modules/doctr/io/reader.html b/v0.1.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.0/_modules/doctr/io/reader.html
+++ b/v0.1.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/zoo.html b/v0.1.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/zoo.html b/v0.1.0/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/models/factory/hub.html b/v0.1.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.0/_modules/doctr/models/factory/hub.html
+++ b/v0.1.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/zoo.html b/v0.1.0/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/models/zoo.html b/v0.1.0/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.0/_modules/doctr/models/zoo.html
+++ b/v0.1.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/transforms/modules/base.html b/v0.1.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/utils/metrics.html b/v0.1.0/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.0/_modules/doctr/utils/metrics.html
+++ b/v0.1.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.0/_modules/doctr/utils/visualization.html b/v0.1.0/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.0/_modules/doctr/utils/visualization.html
+++ b/v0.1.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.0/_modules/index.html b/v0.1.0/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.0/_modules/index.html
+++ b/v0.1.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.0/_sources/getting_started/installing.rst.txt b/v0.1.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.0/_sources/getting_started/installing.rst.txt
+++ b/v0.1.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.0/_static/basic.css b/v0.1.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.0/_static/basic.css
+++ b/v0.1.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.0/_static/doctools.js b/v0.1.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.0/_static/doctools.js
+++ b/v0.1.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.0/_static/language_data.js b/v0.1.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.0/_static/language_data.js
+++ b/v0.1.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.0/_static/searchtools.js b/v0.1.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.0/_static/searchtools.js
+++ b/v0.1.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.0/changelog.html b/v0.1.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.0/changelog.html
+++ b/v0.1.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.0/community/resources.html b/v0.1.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.0/community/resources.html
+++ b/v0.1.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.0/contributing/code_of_conduct.html b/v0.1.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.0/contributing/code_of_conduct.html
+++ b/v0.1.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.0/contributing/contributing.html b/v0.1.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.0/contributing/contributing.html
+++ b/v0.1.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.0/genindex.html b/v0.1.0/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.0/genindex.html
+++ b/v0.1.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.0/getting_started/installing.html b/v0.1.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.0/getting_started/installing.html
+++ b/v0.1.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.0/index.html b/v0.1.0/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.0/index.html
+++ b/v0.1.0/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.0/modules/contrib.html b/v0.1.0/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.0/modules/contrib.html
+++ b/v0.1.0/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.0/modules/datasets.html b/v0.1.0/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.0/modules/datasets.html
+++ b/v0.1.0/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.0/modules/io.html b/v0.1.0/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.0/modules/io.html
+++ b/v0.1.0/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.0/modules/models.html b/v0.1.0/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.0/modules/models.html
+++ b/v0.1.0/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.0/modules/transforms.html b/v0.1.0/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.0/modules/transforms.html
+++ b/v0.1.0/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.0/modules/utils.html b/v0.1.0/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.0/modules/utils.html
+++ b/v0.1.0/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.0/notebooks.html b/v0.1.0/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.0/notebooks.html
+++ b/v0.1.0/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.0/search.html b/v0.1.0/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.0/search.html
+++ b/v0.1.0/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.0/searchindex.js b/v0.1.0/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.0/searchindex.js
+++ b/v0.1.0/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.0/using_doctr/custom_models_training.html b/v0.1.0/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.0/using_doctr/custom_models_training.html
+++ b/v0.1.0/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.0/using_doctr/running_on_aws.html b/v0.1.0/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.0/using_doctr/running_on_aws.html
+++ b/v0.1.0/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.0/using_doctr/sharing_models.html b/v0.1.0/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.0/using_doctr/sharing_models.html
+++ b/v0.1.0/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.0/using_doctr/using_contrib_modules.html b/v0.1.0/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.0/using_doctr/using_contrib_modules.html
+++ b/v0.1.0/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.0/using_doctr/using_datasets.html b/v0.1.0/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.0/using_doctr/using_datasets.html
+++ b/v0.1.0/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.0/using_doctr/using_model_export.html b/v0.1.0/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.0/using_doctr/using_model_export.html
+++ b/v0.1.0/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.0/using_doctr/using_models.html b/v0.1.0/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.0/using_doctr/using_models.html
+++ b/v0.1.0/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/cord.html b/v0.1.1/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.1/_modules/doctr/datasets/cord.html
+++ b/v0.1.1/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/detection.html b/v0.1.1/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.1/_modules/doctr/datasets/detection.html
+++ b/v0.1.1/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/funsd.html b/v0.1.1/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.1/_modules/doctr/datasets/funsd.html
+++ b/v0.1.1/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic03.html b/v0.1.1/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.1/_modules/doctr/datasets/ic03.html
+++ b/v0.1.1/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic13.html b/v0.1.1/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.1/_modules/doctr/datasets/ic13.html
+++ b/v0.1.1/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiit5k.html b/v0.1.1/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.1/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.1/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiithws.html b/v0.1.1/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.1/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.1/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/imgur5k.html b/v0.1.1/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.1/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.1/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/loader.html b/v0.1.1/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.1/_modules/doctr/datasets/loader.html
+++ b/v0.1.1/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/mjsynth.html b/v0.1.1/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.1/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.1/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ocr.html b/v0.1.1/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.1/_modules/doctr/datasets/ocr.html
+++ b/v0.1.1/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/recognition.html b/v0.1.1/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.1/_modules/doctr/datasets/recognition.html
+++ b/v0.1.1/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/sroie.html b/v0.1.1/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.1/_modules/doctr/datasets/sroie.html
+++ b/v0.1.1/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svhn.html b/v0.1.1/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.1/_modules/doctr/datasets/svhn.html
+++ b/v0.1.1/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svt.html b/v0.1.1/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.1/_modules/doctr/datasets/svt.html
+++ b/v0.1.1/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/synthtext.html b/v0.1.1/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.1/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.1/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/utils.html b/v0.1.1/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.1/_modules/doctr/datasets/utils.html
+++ b/v0.1.1/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/wildreceipt.html b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.1/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.1/_modules/doctr/io/elements.html b/v0.1.1/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.1/_modules/doctr/io/elements.html
+++ b/v0.1.1/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.1/_modules/doctr/io/html.html b/v0.1.1/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.1/_modules/doctr/io/html.html
+++ b/v0.1.1/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/base.html b/v0.1.1/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.1/_modules/doctr/io/image/base.html
+++ b/v0.1.1/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/tensorflow.html b/v0.1.1/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.1/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.1/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/io/pdf.html b/v0.1.1/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.1/_modules/doctr/io/pdf.html
+++ b/v0.1.1/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.1/_modules/doctr/io/reader.html b/v0.1.1/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.1/_modules/doctr/io/reader.html
+++ b/v0.1.1/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/zoo.html b/v0.1.1/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.1/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.1/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/zoo.html b/v0.1.1/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.1/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.1/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/factory/hub.html b/v0.1.1/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.1/_modules/doctr/models/factory/hub.html
+++ b/v0.1.1/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/zoo.html b/v0.1.1/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.1/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.1/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/zoo.html b/v0.1.1/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.1/_modules/doctr/models/zoo.html
+++ b/v0.1.1/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/base.html b/v0.1.1/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/utils/metrics.html b/v0.1.1/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.1/_modules/doctr/utils/metrics.html
+++ b/v0.1.1/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.1/_modules/doctr/utils/visualization.html b/v0.1.1/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.1/_modules/doctr/utils/visualization.html
+++ b/v0.1.1/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.1/_modules/index.html b/v0.1.1/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.1/_modules/index.html
+++ b/v0.1.1/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.1/_sources/getting_started/installing.rst.txt b/v0.1.1/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.1/_sources/getting_started/installing.rst.txt
+++ b/v0.1.1/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.1/_static/basic.css b/v0.1.1/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.1/_static/basic.css
+++ b/v0.1.1/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.1/_static/doctools.js b/v0.1.1/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.1/_static/doctools.js
+++ b/v0.1.1/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.1/_static/language_data.js b/v0.1.1/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.1/_static/language_data.js
+++ b/v0.1.1/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.1/_static/searchtools.js b/v0.1.1/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.1/_static/searchtools.js
+++ b/v0.1.1/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.1/changelog.html b/v0.1.1/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.1/changelog.html
+++ b/v0.1.1/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.1/community/resources.html b/v0.1.1/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.1/community/resources.html
+++ b/v0.1.1/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.1/contributing/code_of_conduct.html b/v0.1.1/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.1/contributing/code_of_conduct.html
+++ b/v0.1.1/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.1/contributing/contributing.html b/v0.1.1/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.1/contributing/contributing.html
+++ b/v0.1.1/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.1/genindex.html b/v0.1.1/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.1/genindex.html
+++ b/v0.1.1/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.1/getting_started/installing.html b/v0.1.1/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.1/getting_started/installing.html
+++ b/v0.1.1/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.1/index.html b/v0.1.1/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.1/index.html
+++ b/v0.1.1/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.1/modules/contrib.html b/v0.1.1/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.1/modules/contrib.html
+++ b/v0.1.1/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.1/modules/datasets.html b/v0.1.1/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.1/modules/datasets.html
+++ b/v0.1.1/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.1/modules/io.html b/v0.1.1/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.1/modules/io.html
+++ b/v0.1.1/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.1/modules/models.html b/v0.1.1/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.1/modules/models.html
+++ b/v0.1.1/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.1/modules/transforms.html b/v0.1.1/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.1/modules/transforms.html
+++ b/v0.1.1/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
Supported contribution modules
-
+
diff --git a/latest/modules/datasets.html b/latest/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/latest/modules/datasets.html
+++ b/latest/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/latest/modules/io.html b/latest/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/latest/modules/io.html
+++ b/latest/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/latest/modules/models.html b/latest/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/latest/modules/models.html
+++ b/latest/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/latest/modules/transforms.html b/latest/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/latest/modules/transforms.html
+++ b/latest/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/latest/modules/utils.html b/latest/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/latest/modules/utils.html
+++ b/latest/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/latest/notebooks.html b/latest/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/latest/notebooks.html
+++ b/latest/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/latest/search.html b/latest/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/latest/search.html
+++ b/latest/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/latest/searchindex.js b/latest/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/latest/searchindex.js
+++ b/latest/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/latest/using_doctr/custom_models_training.html b/latest/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/latest/using_doctr/custom_models_training.html
+++ b/latest/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/latest/using_doctr/running_on_aws.html b/latest/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/latest/using_doctr/running_on_aws.html
+++ b/latest/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/latest/using_doctr/sharing_models.html b/latest/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/latest/using_doctr/sharing_models.html
+++ b/latest/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/latest/using_doctr/using_contrib_modules.html b/latest/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/latest/using_doctr/using_contrib_modules.html
+++ b/latest/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/latest/using_doctr/using_datasets.html b/latest/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/latest/using_doctr/using_datasets.html
+++ b/latest/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/latest/using_doctr/using_model_export.html b/latest/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/latest/using_doctr/using_model_export.html
+++ b/latest/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/latest/using_doctr/using_models.html b/latest/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/latest/using_doctr/using_models.html
+++ b/latest/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/modules/contrib.html b/modules/contrib.html
index 22b0c508a6..b8878635b6 100644
--- a/modules/contrib.html
+++ b/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -376,7 +376,7 @@ Supported contribution modules
-
+
diff --git a/modules/datasets.html b/modules/datasets.html
index 0fe4b78d48..dfcacbc96e 100644
--- a/modules/datasets.html
+++ b/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1077,7 +1077,7 @@ Returns:
-
+
diff --git a/modules/io.html b/modules/io.html
index 924d292c59..77e9e017bf 100644
--- a/modules/io.html
+++ b/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -756,7 +756,7 @@ Returns:¶
-
+
diff --git a/modules/models.html b/modules/models.html
index bf45d11a71..f4a9833365 100644
--- a/modules/models.html
+++ b/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1598,7 +1598,7 @@ Args:¶
-
+
diff --git a/modules/transforms.html b/modules/transforms.html
index 6d77d16e7b..bc254c867b 100644
--- a/modules/transforms.html
+++ b/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -831,7 +831,7 @@ Args:¶<
-
+
diff --git a/modules/utils.html b/modules/utils.html
index 3dd3ecbd96..6784d81f6f 100644
--- a/modules/utils.html
+++ b/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -711,7 +711,7 @@ Args:¶
-
+
diff --git a/notebooks.html b/notebooks.html
index f3ea994e49..647f73d4eb 100644
--- a/notebooks.html
+++ b/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -387,7 +387,7 @@ docTR Notebooks
-
+
diff --git a/search.html b/search.html
index f0693e2c97..0e0da5efb3 100644
--- a/search.html
+++ b/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -336,7 +336,7 @@
-
+
diff --git a/searchindex.js b/searchindex.js
index 8598997441..df18967072 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[1, "correction"]], "2. Warning": [[1, "warning"]], "3. Temporary Ban": [[1, "temporary-ban"]], "4. Permanent Ban": [[1, "permanent-ban"]], "AWS Lambda": [[13, null]], "Advanced options": [[18, "advanced-options"]], "Args:": [[6, "args"], [6, "id4"], [6, "id7"], [6, "id10"], [6, "id13"], [6, "id16"], [6, "id19"], [6, "id22"], [6, "id25"], [6, "id29"], [6, "id32"], [6, "id37"], [6, "id40"], [6, "id46"], [6, "id49"], [6, "id50"], [6, "id51"], [6, "id54"], [6, "id57"], [6, "id60"], [6, "id61"], [7, "args"], [7, "id2"], [7, "id3"], [7, "id4"], [7, "id5"], [7, "id6"], [7, "id7"], [7, "id10"], [7, "id12"], [7, "id14"], [7, "id16"], [7, "id20"], [7, "id24"], [7, "id28"], [8, "args"], [8, "id3"], [8, "id8"], [8, "id13"], [8, "id17"], [8, "id21"], [8, "id26"], [8, "id31"], [8, "id36"], [8, "id41"], [8, "id46"], [8, "id50"], [8, "id54"], [8, "id59"], [8, "id63"], [8, "id68"], [8, "id73"], [8, "id77"], [8, "id81"], [8, "id85"], [8, "id90"], [8, "id95"], [8, "id99"], [8, "id104"], [8, "id109"], [8, "id114"], [8, "id119"], [8, "id123"], [8, "id127"], [8, "id132"], [8, "id137"], [8, "id142"], [8, "id146"], [8, "id150"], [8, "id155"], [8, "id159"], [8, "id163"], [8, "id167"], [8, "id169"], [8, "id171"], [8, "id173"], [9, "args"], [9, "id1"], [9, "id2"], [9, "id3"], [9, "id4"], [9, "id5"], [9, "id6"], [9, "id7"], [9, "id8"], [9, "id9"], [9, "id10"], [9, "id11"], [9, "id12"], [9, "id13"], [9, "id14"], [9, "id15"], [9, "id16"], [9, "id17"], [9, "id18"], [9, "id19"], [10, "args"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"]], "Artefact": [[7, "artefact"]], "ArtefactDetection": [[15, "artefactdetection"]], "Attribution": [[1, "attribution"]], "Available Datasets": [[16, "available-datasets"]], "Available architectures": [[18, "available-architectures"], [18, "id1"], [18, "id2"]], "Available contribution modules": [[15, "available-contribution-modules"]], "Block": [[7, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[16, null]], "Choosing the right model": [[18, null]], "Classification": [[14, "classification"]], "Code quality": [[2, "code-quality"]], "Code style verification": [[2, "code-style-verification"]], "Codebase structure": [[2, "codebase-structure"]], "Commits": [[2, "commits"]], "Composing transformations": [[9, "composing-transformations"]], "Continuous Integration": [[2, "continuous-integration"]], "Contributing to docTR": [[2, null]], "Contributor Covenant Code of Conduct": [[1, null]], "Custom dataset loader": [[6, "custom-dataset-loader"]], "Custom orientation classification models": [[12, "custom-orientation-classification-models"]], "Data Loading": [[16, "data-loading"]], "Dataloader": [[6, "dataloader"]], "Detection": [[14, "detection"], [16, "detection"]], "Detection predictors": [[18, "detection-predictors"]], "Developer mode installation": [[2, "developer-mode-installation"]], "Developing docTR": [[2, "developing-doctr"]], "Document": [[7, "document"]], "Document structure": [[7, "document-structure"]], "End-to-End OCR": [[18, "end-to-end-ocr"]], "Enforcement": [[1, "enforcement"]], "Enforcement Guidelines": [[1, "enforcement-guidelines"]], "Enforcement Responsibilities": [[1, "enforcement-responsibilities"]], "Export to ONNX": [[17, "export-to-onnx"]], "Feature requests & bug report": [[2, "feature-requests-bug-report"]], "Feedback": [[2, "feedback"]], "File reading": [[7, "file-reading"]], "Half-precision": [[17, "half-precision"]], "Installation": [[3, null]], "Integrate contributions into your pipeline": [[15, null]], "Let\u2019s connect": [[2, "let-s-connect"]], "Line": [[7, "line"]], "Loading from Huggingface Hub": [[14, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[12, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[12, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[4, "main-features"]], "Model optimization": [[17, "model-optimization"]], "Model zoo": [[4, "model-zoo"]], "Modifying the documentation": [[2, "modifying-the-documentation"]], "Naming conventions": [[14, "naming-conventions"]], "OCR": [[16, "ocr"]], "Object Detection": [[16, "object-detection"]], "Our Pledge": [[1, "our-pledge"]], "Our Standards": [[1, "our-standards"]], "Page": [[7, "page"]], "Preparing your model for inference": [[17, null]], "Prerequisites": [[3, "prerequisites"]], "Pretrained community models": [[14, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[14, "pushing-to-the-huggingface-hub"]], "Questions": [[2, "questions"]], "Recognition": [[14, "recognition"], [16, "recognition"]], "Recognition predictors": [[18, "recognition-predictors"]], "Returns:": [[6, "returns"], [7, "returns"], [7, "id11"], [7, "id13"], [7, "id15"], [7, "id19"], [7, "id23"], [7, "id27"], [7, "id31"], [8, "returns"], [8, "id6"], [8, "id11"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id29"], [8, "id34"], [8, "id39"], [8, "id44"], [8, "id49"], [8, "id53"], [8, "id57"], [8, "id62"], [8, "id66"], [8, "id71"], [8, "id76"], [8, "id80"], [8, "id84"], [8, "id88"], [8, "id93"], [8, "id98"], [8, "id102"], [8, "id107"], [8, "id112"], [8, "id117"], [8, "id122"], [8, "id126"], [8, "id130"], [8, "id135"], [8, "id140"], [8, "id145"], [8, "id149"], [8, "id153"], [8, "id158"], [8, "id162"], [8, "id166"], [8, "id168"], [8, "id170"], [8, "id172"], [10, "returns"]], "Scope": [[1, "scope"]], "Share your model with the community": [[14, null]], "Supported Vocabs": [[6, "supported-vocabs"]], "Supported contribution modules": [[5, "supported-contribution-modules"]], "Supported datasets": [[4, "supported-datasets"]], "Supported transformations": [[9, "supported-transformations"]], "Synthetic dataset generator": [[6, "synthetic-dataset-generator"], [16, "synthetic-dataset-generator"]], "Task evaluation": [[10, "task-evaluation"]], "Text Detection": [[18, "text-detection"]], "Text Recognition": [[18, "text-recognition"]], "Text detection models": [[4, "text-detection-models"]], "Text recognition models": [[4, "text-recognition-models"]], "Train your own model": [[12, null]], "Two-stage approaches": [[18, "two-stage-approaches"]], "Unit tests": [[2, "unit-tests"]], "Use your own datasets": [[16, "use-your-own-datasets"]], "Using your ONNX exported model": [[17, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[3, "via-conda-only-for-linux"]], "Via Git": [[3, "via-git"]], "Via Python Package": [[3, "via-python-package"]], "Visualization": [[10, "visualization"]], "What should I do with the output?": [[18, "what-should-i-do-with-the-output"]], "Word": [[7, "word"]], "docTR Notebooks": [[11, null]], "docTR Vocabs": [[6, "id62"]], "docTR: Document Text Recognition": [[4, null]], "doctr.contrib": [[5, null]], "doctr.datasets": [[6, null], [6, "datasets"]], "doctr.io": [[7, null]], "doctr.models": [[8, null]], "doctr.models.classification": [[8, "doctr-models-classification"]], "doctr.models.detection": [[8, "doctr-models-detection"]], "doctr.models.factory": [[8, "doctr-models-factory"]], "doctr.models.recognition": [[8, "doctr-models-recognition"]], "doctr.models.zoo": [[8, "doctr-models-zoo"]], "doctr.transforms": [[9, null]], "doctr.utils": [[10, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[7, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[7, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[9, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[6, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[9, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[9, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[6, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[6, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[8, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[6, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[6, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[7, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[7, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[6, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[6, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[9, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[9, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[6, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[6, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[6, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[6, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[6, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[8, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[9, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[7, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[6, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[9, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[8, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[6, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[9, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[7, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[9, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[9, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[9, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[9, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[9, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[9, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[9, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[9, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[9, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[9, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[9, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[9, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[7, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[7, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[7, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[6, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[9, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[7, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[7, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[6, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[6, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[6, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[6, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[9, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[10, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[6, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[7, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[6, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[6, 0, 1, "", "CORD"], [6, 0, 1, "", "CharacterGenerator"], [6, 0, 1, "", "DetectionDataset"], [6, 0, 1, "", "DocArtefacts"], [6, 0, 1, "", "FUNSD"], [6, 0, 1, "", "IC03"], [6, 0, 1, "", "IC13"], [6, 0, 1, "", "IIIT5K"], [6, 0, 1, "", "IIITHWS"], [6, 0, 1, "", "IMGUR5K"], [6, 0, 1, "", "MJSynth"], [6, 0, 1, "", "OCRDataset"], [6, 0, 1, "", "RecognitionDataset"], [6, 0, 1, "", "SROIE"], [6, 0, 1, "", "SVHN"], [6, 0, 1, "", "SVT"], [6, 0, 1, "", "SynthText"], [6, 0, 1, "", "WILDRECEIPT"], [6, 0, 1, "", "WordGenerator"], [6, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[6, 0, 1, "", "DataLoader"]], "doctr.io": [[7, 0, 1, "", "Artefact"], [7, 0, 1, "", "Block"], [7, 0, 1, "", "Document"], [7, 0, 1, "", "DocumentFile"], [7, 0, 1, "", "Line"], [7, 0, 1, "", "Page"], [7, 0, 1, "", "Word"], [7, 1, 1, "", "decode_img_as_tensor"], [7, 1, 1, "", "read_html"], [7, 1, 1, "", "read_img_as_numpy"], [7, 1, 1, "", "read_img_as_tensor"], [7, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[7, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[7, 2, 1, "", "from_images"], [7, 2, 1, "", "from_pdf"], [7, 2, 1, "", "from_url"]], "doctr.io.Page": [[7, 2, 1, "", "show"]], "doctr.models": [[8, 1, 1, "", "kie_predictor"], [8, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[8, 1, 1, "", "crop_orientation_predictor"], [8, 1, 1, "", "magc_resnet31"], [8, 1, 1, "", "mobilenet_v3_large"], [8, 1, 1, "", "mobilenet_v3_large_r"], [8, 1, 1, "", "mobilenet_v3_small"], [8, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [8, 1, 1, "", "mobilenet_v3_small_page_orientation"], [8, 1, 1, "", "mobilenet_v3_small_r"], [8, 1, 1, "", "page_orientation_predictor"], [8, 1, 1, "", "resnet18"], [8, 1, 1, "", "resnet31"], [8, 1, 1, "", "resnet34"], [8, 1, 1, "", "resnet50"], [8, 1, 1, "", "textnet_base"], [8, 1, 1, "", "textnet_small"], [8, 1, 1, "", "textnet_tiny"], [8, 1, 1, "", "vgg16_bn_r"], [8, 1, 1, "", "vit_b"], [8, 1, 1, "", "vit_s"]], "doctr.models.detection": [[8, 1, 1, "", "db_mobilenet_v3_large"], [8, 1, 1, "", "db_resnet50"], [8, 1, 1, "", "detection_predictor"], [8, 1, 1, "", "fast_base"], [8, 1, 1, "", "fast_small"], [8, 1, 1, "", "fast_tiny"], [8, 1, 1, "", "linknet_resnet18"], [8, 1, 1, "", "linknet_resnet34"], [8, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[8, 1, 1, "", "from_hub"], [8, 1, 1, "", "login_to_hub"], [8, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[8, 1, 1, "", "crnn_mobilenet_v3_large"], [8, 1, 1, "", "crnn_mobilenet_v3_small"], [8, 1, 1, "", "crnn_vgg16_bn"], [8, 1, 1, "", "master"], [8, 1, 1, "", "parseq"], [8, 1, 1, "", "recognition_predictor"], [8, 1, 1, "", "sar_resnet31"], [8, 1, 1, "", "vitstr_base"], [8, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[9, 0, 1, "", "ChannelShuffle"], [9, 0, 1, "", "ColorInversion"], [9, 0, 1, "", "Compose"], [9, 0, 1, "", "GaussianBlur"], [9, 0, 1, "", "GaussianNoise"], [9, 0, 1, "", "LambdaTransformation"], [9, 0, 1, "", "Normalize"], [9, 0, 1, "", "OneOf"], [9, 0, 1, "", "RandomApply"], [9, 0, 1, "", "RandomBrightness"], [9, 0, 1, "", "RandomContrast"], [9, 0, 1, "", "RandomCrop"], [9, 0, 1, "", "RandomGamma"], [9, 0, 1, "", "RandomHorizontalFlip"], [9, 0, 1, "", "RandomHue"], [9, 0, 1, "", "RandomJpegQuality"], [9, 0, 1, "", "RandomResize"], [9, 0, 1, "", "RandomRotate"], [9, 0, 1, "", "RandomSaturation"], [9, 0, 1, "", "RandomShadow"], [9, 0, 1, "", "Resize"], [9, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[10, 0, 1, "", "DetectionMetric"], [10, 0, 1, "", "LocalizationConfusion"], [10, 0, 1, "", "OCRMetric"], [10, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.visualization": [[10, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [1, 7, 8, 10, 14, 17], "0": [1, 3, 6, 9, 10, 12, 15, 16, 18], "00": 18, "01": 18, "0123456789": 6, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "02562": 8, "03": 18, "035": 18, "0361328125": 18, "04": 18, "05": 18, "06": 18, "06640625": 18, "07": 18, "08": [9, 18], "09": 18, "0966796875": 18, "1": [6, 7, 8, 9, 10, 12, 16, 18], "10": [6, 10, 18], "100": [6, 9, 10, 16, 18], "1000": 18, "101": 6, "1024": [8, 12, 18], "104": 6, "106": 6, "108": 6, "1095": 16, "11": 18, "110": 10, "1107": 16, "114": 6, "115": 6, "1156": 16, "116": 6, "118": 6, "11800h": 18, "11th": 18, "12": 18, "120": 6, "123": 6, "126": 6, "1268": 16, "128": [8, 12, 17, 18], "13": 18, "130": 6, "13068": 16, "131": 6, "1337891": 16, "1357421875": 18, "1396484375": 18, "14": 18, "1420": 18, "14470v1": 6, "149": 16, "15": 18, "150": [10, 18], "1552": 18, "16": [8, 17, 18], "1630859375": 18, "1684": 18, "16x16": 8, "17": 18, "1778": 18, "1782": 18, "18": [8, 18], "185546875": 18, "1900": 18, "1910": 8, "19342": 16, "19370": 16, "195": 6, "19598": 16, "199": 18, "1999": 18, "2": [3, 4, 6, 7, 9, 15, 18], "20": 18, "200": 10, "2000": 16, "2003": [4, 6], "2012": 6, "2013": [4, 6], "2015": 6, "2019": 4, "207901": 16, "21": 18, "2103": 6, "2186": 16, "21888": 16, "22": 18, "224": [8, 9], "225": 9, "22672": 16, "229": [9, 16], "23": 18, "233": 16, "236": 6, "24": 18, "246": 16, "249": 16, "25": 18, "2504": 18, "255": [7, 8, 9, 10, 18], "256": 8, "257": 16, "26": 18, "26032": 16, "264": 12, "27": 18, "2700": 16, "2710": 18, "2749": 12, "28": 18, "287": 12, "29": 18, "296": 12, "299": 12, "2d": 18, "3": [3, 4, 7, 8, 9, 10, 17, 18], "30": 18, "300": 16, "3000": 16, "301": 12, "30595": 18, "30ghz": 18, "31": 8, "32": [6, 8, 9, 12, 16, 17, 18], "3232421875": 18, "33": [9, 18], "33402": 16, "33608": 16, "34": [8, 18], "340": 18, "3456": 18, "3515625": 18, "36": 18, "360": 16, "37": [6, 18], "38": 18, "39": 18, "4": [8, 9, 10, 18], "40": 18, "406": 9, "41": 18, "42": 18, "43": 18, "44": 18, "45": 18, "456": 9, "46": 18, "47": 18, "472": 16, "48": [6, 18], "485": 9, "49": 18, "49377": 16, "5": [6, 9, 10, 15, 18], "50": [8, 16, 18], "51": 18, "51171875": 18, "512": 8, "52": [6, 18], "529": 18, "53": 18, "54": 18, "540": 18, "5478515625": 18, "55": 18, "56": 18, "57": 18, "58": [6, 18], "580": 18, "5810546875": 18, "583": 18, "59": 18, "597": 18, "5k": [4, 6], "5m": 18, "6": [9, 18], "60": 9, "600": [8, 10, 18], "61": 18, "62": 18, "626": 16, "63": 18, "64": [8, 9, 18], "641": 18, "647": 16, "65": 18, "66": 18, "67": 18, "68": 18, "69": 18, "693": 12, "694": 12, "695": 12, "6m": 18, "7": 18, "70": [6, 10, 18], "707470": 16, "71": [6, 18], "7100000": 16, "7141797": 16, "7149": 16, "72": 18, "72dpi": 7, "73": 18, "73257": 16, "74": 18, "75": [9, 18], "7581382": 16, "76": 18, "77": 18, "772": 12, "772875": 16, "78": 18, "785": 12, "79": 18, "793533": 16, "796": 16, "798": 12, "7m": 18, "8": [8, 9, 18], "80": 18, "800": [8, 10, 16, 18], "81": 18, "82": 18, "83": 18, "84": 18, "849": 16, "85": 18, "8564453125": 18, "857": 18, "85875": 16, "86": 18, "8603515625": 18, "87": 18, "8707": 16, "88": 18, "89": 18, "9": [3, 9, 18], "90": 18, "90k": 6, "90kdict32px": 6, "91": 18, "914085328578949": 18, "92": 18, "93": 18, "94": [6, 18], "95": [10, 18], "9578408598899841": 18, "96": 18, "97": 18, "98": 18, "99": 18, "9949972033500671": 18, "A": [1, 2, 4, 6, 7, 8, 11, 17], "As": 2, "Be": 18, "Being": 1, "By": 13, "For": [1, 2, 3, 12, 18], "If": [2, 7, 8, 12, 18], "In": [2, 6, 16], "It": [9, 14, 15, 17], "Its": [4, 8], "No": [1, 18], "Of": 6, "Or": [15, 17], "The": [1, 2, 6, 7, 10, 13, 15, 16, 17, 18], "Then": 8, "To": [2, 3, 13, 14, 15, 17, 18], "_": [1, 6, 8], "__call__": 18, "_build": 2, "_i": 10, "ab": 6, "abc": 17, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "abdef": [6, 16], "abl": [16, 18], "about": [1, 16, 18], "abov": 18, "abstractdataset": 6, "abus": 1, "accept": 1, "access": [4, 7, 16, 18], "account": [1, 14], "accur": 18, "accuraci": 10, "achiev": 17, "act": 1, "action": 1, "activ": 4, "ad": [2, 8, 9], "adapt": 1, "add": [9, 10, 14, 18], "add_hook": 18, "add_label": 10, "addit": [2, 3, 7, 15, 18], "addition": [2, 18], "address": [1, 7], "adjust": 9, "advanc": 1, "advantag": 17, "advis": 2, "aesthet": [4, 6], "affect": 1, "after": [14, 18], "ag": 1, "again": 8, "aggreg": [10, 16], "aggress": 1, "align": [1, 7, 9], "all": [1, 2, 5, 6, 7, 9, 10, 15, 16, 18], "allow": [1, 17], "along": 18, "alreadi": [2, 17], "also": [1, 8, 14, 15, 16, 18], "alwai": 16, "an": [1, 2, 4, 6, 7, 8, 10, 15, 17, 18], "analysi": [7, 15], "ancient_greek": 6, "angl": [7, 9], "ani": [1, 6, 7, 8, 9, 10, 17, 18], "annot": 6, "anot": 16, "anoth": [8, 12, 16], "answer": 1, "anyascii": 10, "anyon": 4, "anyth": 15, "api": [2, 4], "apolog": 1, "apologi": 1, "app": 2, "appear": 1, "appli": [1, 6, 9], "applic": [4, 8], "appoint": 1, "appreci": 14, "appropri": [1, 2, 18], "ar": [1, 2, 3, 5, 6, 7, 9, 10, 11, 15, 16, 18], "arab": 6, "arabic_diacrit": 6, "arabic_lett": 6, "arabic_punctu": 6, "arbitrarili": [4, 8], "arch": [8, 14], "architectur": [4, 8, 14, 15], "area": 18, "argument": [6, 7, 8, 10, 12, 18], "around": 1, "arrai": [7, 9, 10], "art": [4, 15], "artefact": [10, 15, 18], "artefact_typ": 7, "artifici": [4, 6], "arxiv": [6, 8], "asarrai": 10, "ascii_lett": 6, "aspect": [4, 8, 9, 18], "assess": 10, "assign": 10, "associ": 7, "assum": 8, "assume_straight_pag": [8, 12, 18], "astyp": [8, 10, 18], "attack": 1, "attend": [4, 8], "attent": [1, 8], "autom": 4, "automat": 18, "autoregress": [4, 8], "avail": [1, 4, 5, 9], "averag": [9, 18], "avoid": [1, 3], "aw": [4, 18], "awar": 18, "azur": 18, "b": [8, 10, 18], "b_j": 10, "back": 2, "backbon": 8, "backend": 18, "background": 16, "bangla": 6, "bar": 15, "bar_cod": 16, "base": [4, 8, 15], "baselin": [4, 8, 18], "batch": [6, 8, 9, 15, 16, 18], "batch_siz": [6, 12, 15, 16, 17], "bblanchon": 3, "bbox": 18, "becaus": 13, "been": [2, 10, 16, 18], "befor": [6, 8, 9, 18], "begin": 10, "behavior": [1, 18], "being": [10, 18], "belong": 18, "benchmark": 18, "best": 1, "better": [11, 18], "between": [9, 10, 18], "bgr": 7, "bilinear": 9, "bin_thresh": 18, "binar": [4, 8, 18], "binari": [7, 17, 18], "bit": 17, "block": [10, 18], "block_1_1": 18, "blur": 9, "bmvc": 6, "bn": 14, "bodi": [1, 18], "bool": [6, 7, 8, 9, 10], "boolean": [8, 18], "both": [4, 6, 9, 16, 18], "bottom": [8, 18], "bound": [6, 7, 8, 9, 10, 15, 16, 18], "box": [6, 7, 8, 9, 10, 15, 16, 18], "box_thresh": 18, "bright": 9, "browser": [2, 4], "build": [2, 3, 17], "built": 2, "byte": [7, 18], "c": [3, 7, 10], "c_j": 10, "cach": [2, 6, 13], "cache_sampl": 6, "call": 17, "callabl": [6, 9], "can": [2, 3, 12, 13, 14, 15, 16, 18], "capabl": [2, 11, 18], "case": [6, 10], "cf": 18, "cfg": 18, "challeng": 6, "challenge2_test_task12_imag": 6, "challenge2_test_task1_gt": 6, "challenge2_training_task12_imag": 6, "challenge2_training_task1_gt": 6, "chang": [13, 18], "channel": [1, 2, 7, 9], "channel_prior": 3, "channelshuffl": 9, "charact": [4, 6, 7, 10, 16, 18], "charactergener": [6, 16], "characterist": 1, "charg": 18, "charset": 18, "chart": 7, "check": [2, 14, 18], "checkpoint": 8, "chip": 3, "ci": 2, "clarifi": 1, "clariti": 1, "class": [1, 6, 7, 9, 10, 18], "class_nam": 12, "classif": [16, 18], "classmethod": 7, "clear": 2, "clone": 3, "close": 2, "co": 14, "code": [4, 7, 15], "codecov": 2, "colab": 11, "collate_fn": 6, "collect": [7, 15], "color": 9, "colorinvers": 9, "column": 7, "com": [1, 3, 7, 8, 14], "combin": 18, "command": [2, 15], "comment": 1, "commit": 1, "common": [1, 9, 10, 17], "commun": 1, "compar": 4, "comparison": [10, 18], "competit": 6, "compil": [11, 18], "complaint": 1, "complementari": 10, "complet": 2, "compon": 18, "compos": [6, 18], "comprehens": 18, "comput": [6, 10, 17, 18], "conf_threshold": 15, "confid": [7, 18], "config": [3, 8], "configur": 8, "confus": 10, "consecut": [9, 18], "consequ": 1, "consid": [1, 2, 6, 7, 10, 18], "consist": 18, "consolid": [4, 6], "constant": 9, "construct": 1, "contact": 1, "contain": [5, 6, 11, 16, 18], "content": [6, 7, 18], "context": 8, "contib": 3, "continu": 1, "contrast": 9, "contrast_factor": 9, "contrib": [3, 15], "contribut": 1, "contributor": 2, "convers": 7, "convert": [7, 9], "convolut": 8, "coordin": [7, 18], "cord": [4, 6, 16, 18], "core": [10, 18], "corner": 18, "correct": 9, "correspond": [3, 7, 9, 18], "could": [1, 15], "counterpart": 10, "cover": 2, "coverag": 2, "cpu": [4, 12, 17], "creat": 14, "crnn": [4, 8, 14], "crnn_mobilenet_v3_larg": [8, 14, 18], "crnn_mobilenet_v3_smal": [8, 17, 18], "crnn_vgg16_bn": [8, 12, 14, 18], "crop": [7, 8, 9, 12, 16, 18], "crop_orient": [7, 18], "crop_orientation_predictor": [8, 12], "crop_param": 12, "cuda": 17, "currenc": 6, "current": [2, 12, 18], "custom": [14, 15, 17, 18], "custom_crop_orientation_model": 12, "custom_page_orientation_model": 12, "customhook": 18, "cvit": 4, "czczup": 8, "czech": 6, "d": [6, 16], "danish": 6, "data": [4, 6, 7, 9, 10, 12, 14], "dataload": 16, "dataset": [8, 12, 18], "dataset_info": 6, "date": [12, 18], "db": 14, "db_mobilenet_v3_larg": [8, 14, 18], "db_resnet34": 18, "db_resnet50": [8, 12, 14, 18], "dbnet": [4, 8], "deal": [11, 18], "decis": 1, "decod": 7, "decode_img_as_tensor": 7, "dedic": 17, "deem": 1, "deep": [8, 18], "def": 18, "default": [3, 7, 12, 13, 18], "defer": 16, "defin": [10, 17], "degre": [7, 9, 18], "degress": 7, "delet": 2, "delimit": 18, "delta": 9, "demo": [2, 4], "demonstr": 1, "depend": [2, 3, 4, 18], "deploi": 2, "deploy": 4, "derogatori": 1, "describ": 8, "descript": 11, "design": 9, "desir": 7, "det_arch": [8, 12, 14, 17], "det_b": 18, "det_model": [12, 14, 17], "det_param": 12, "det_predictor": [12, 18], "detail": [12, 18], "detect": [6, 7, 10, 11, 12, 15], "detect_languag": 8, "detect_orient": [8, 12, 18], "detection_predictor": [8, 18], "detection_task": [6, 16], "detectiondataset": [6, 16], "detectionmetr": 10, "detectionpredictor": [8, 12], "detector": [4, 8, 15], "deterior": 8, "determin": 1, "dev": [2, 13], "develop": 3, "deviat": 9, "devic": 17, "dict": [7, 10, 18], "dictionari": [7, 10], "differ": 1, "differenti": [4, 8], "digit": [4, 6, 16], "dimens": [7, 10, 18], "dimension": 9, "direct": 6, "directli": [14, 18], "directori": [2, 13], "disabl": [1, 13, 18], "disable_crop_orient": 18, "disable_page_orient": 18, "disclaim": 18, "discuss": 2, "disparag": 1, "displai": [7, 10], "display_artefact": 10, "distribut": 9, "div": 18, "divers": 1, "divid": 7, "do": [2, 3, 8], "doc": [2, 7, 15, 17, 18], "docartefact": [6, 16], "docstr": 2, "doctr": [3, 12, 13, 14, 15, 16, 17, 18], "doctr_cache_dir": 13, "doctr_multiprocessing_dis": 13, "document": [6, 8, 10, 11, 12, 15, 16, 17, 18], "documentbuild": 18, "documentfil": [7, 12, 14, 15, 17], "doesn": 17, "don": [12, 18], "done": 9, "download": [6, 16], "downsiz": 8, "draw": 9, "drop": 6, "drop_last": 6, "dtype": [7, 8, 9, 10, 17], "dual": [4, 6], "dummi": 14, "dummy_img": 18, "dummy_input": 17, "dure": 1, "dutch": 6, "dynam": [6, 15], "dynamic_seq_length": 6, "e": [1, 2, 3, 7, 8], "each": [4, 6, 7, 8, 9, 10, 16, 18], "eas": 2, "easi": [4, 10, 14, 17], "easili": [7, 10, 12, 14, 16, 18], "econom": 1, "edit": 1, "educ": 1, "effect": 18, "effici": [2, 4, 6, 8], "either": [10, 18], "element": [6, 7, 8, 18], "els": [2, 15], "email": 1, "empathi": 1, "en": 18, "enabl": [6, 7], "enclos": 7, "encod": [4, 6, 7, 8, 18], "encode_sequ": 6, "encount": 2, "encrypt": 7, "end": [4, 6, 8, 10], "english": [6, 16], "enough": [2, 18], "ensur": 2, "entri": 6, "environ": [1, 13], "eo": 6, "equiv": 18, "estim": 8, "etc": [7, 15], "ethnic": 1, "evalu": [16, 18], "event": 1, "everyon": 1, "everyth": [2, 18], "exact": [10, 18], "exampl": [1, 2, 4, 6, 8, 14, 18], "exchang": 17, "execut": 18, "exist": 14, "expand": 9, "expect": [7, 9, 10], "experi": 1, "explan": [1, 18], "explicit": 1, "exploit": [4, 8], "export": [7, 8, 10, 11, 15, 18], "export_as_straight_box": [8, 18], "export_as_xml": 18, "export_model_to_onnx": 17, "express": [1, 9], "extens": 7, "extern": [1, 16], "extract": [4, 6], "extractor": 8, "f_": 10, "f_a": 10, "factor": 9, "fair": 1, "fairli": 1, "fals": [6, 7, 8, 9, 10, 12, 18], "faq": 1, "fascan": 14, "fast": [4, 6, 8], "fast_bas": [8, 18], "fast_smal": [8, 18], "fast_tini": [8, 18], "faster": [4, 8, 17], "fasterrcnn_mobilenet_v3_large_fpn": 8, "favorit": 18, "featur": [3, 8, 10, 11, 12, 15], "feedback": 1, "feel": [2, 14], "felix92": 14, "few": [17, 18], "figsiz": 10, "figur": [10, 15], "file": [2, 6], "final": 8, "find": [2, 16], "finnish": 6, "first": [2, 6], "firsthand": 6, "fit": [8, 18], "flag": 18, "flip": 9, "float": [7, 9, 10, 17], "float32": [7, 8, 9, 17], "fn": 9, "focu": 14, "focus": [1, 6], "folder": 6, "follow": [1, 2, 3, 6, 9, 10, 12, 13, 14, 15, 18], "font": 6, "font_famili": 6, "foral": 10, "forc": 2, "forg": 3, "form": [4, 6, 18], "format": [7, 10, 12, 16, 17, 18], "forpost": [4, 6], "forum": 2, "fp16": 17, "frac": 10, "framework": [3, 14, 16, 18], "free": [1, 2, 14], "french": [6, 12, 14, 18], "friendli": 4, "from": [1, 4, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18], "from_hub": [8, 14], "from_imag": [7, 14, 15, 17], "from_pdf": 7, "from_url": 7, "full": [6, 10, 18], "function": [6, 9, 10, 15], "funsd": [4, 6, 16, 18], "further": 16, "futur": 6, "g": [7, 8], "g_": 10, "g_x": 10, "gamma": 9, "gaussian": 9, "gaussianblur": 9, "gaussiannois": 9, "gen": 18, "gender": 1, "gener": [2, 4, 7, 8], "generic_cyrillic_lett": 6, "geometri": [4, 7, 18], "geq": 10, "german": [6, 12, 14], "get": [17, 18], "git": 14, "github": [2, 3, 8, 14], "give": [1, 15], "given": [6, 7, 9, 10, 18], "global": 8, "go": 18, "good": 17, "googl": 2, "googlevis": 4, "gpu": [4, 15, 17], "gracefulli": 1, "graph": [4, 6, 7], "grayscal": 9, "ground": 10, "groung": 10, "group": [4, 18], "gt": 10, "gt_box": 10, "gt_label": 10, "guid": 2, "guidanc": 16, "gvision": 18, "h": [7, 8, 9], "h_": 10, "ha": [2, 6, 10, 16], "handl": [11, 16, 18], "handwrit": 6, "handwritten": 16, "harass": 1, "hardwar": 18, "harm": 1, "hat": 10, "have": [1, 2, 10, 12, 14, 16, 17, 18], "head": [8, 18], "healthi": 1, "hebrew": 6, "height": [7, 9], "hello": [10, 18], "help": 17, "here": [5, 9, 11, 15, 16, 18], "hf": 8, "hf_hub_download": 8, "high": 7, "higher": [3, 6, 18], "hindi": 6, "hindi_digit": 6, "hocr": 18, "hook": 18, "horizont": [7, 9, 18], "hous": 6, "how": [2, 11, 12, 14, 16], "howev": 16, "hsv": 9, "html": [1, 2, 3, 7, 18], "http": [1, 3, 6, 7, 8, 14, 18], "hub": 8, "hue": 9, "huggingfac": 8, "hw": 6, "i": [1, 2, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17], "i7": 18, "ic03": [4, 6, 16], "ic13": [4, 6, 16], "icdar": [4, 6], "icdar2019": 6, "id": 18, "ident": 1, "identifi": 4, "iiit": [4, 6], "iiit5k": [6, 16], "iiithw": [4, 6, 16], "imag": [4, 6, 7, 8, 9, 10, 14, 15, 16, 18], "imagenet": 8, "imageri": 1, "images_90k_norm": 6, "img": [6, 9, 16, 17], "img_cont": 7, "img_fold": [6, 16], "img_path": 7, "img_transform": 6, "imgur5k": [4, 6, 16], "imgur5k_annot": 6, "imlist": 6, "impact": 1, "implement": [6, 7, 8, 9, 10, 18], "import": [6, 7, 8, 9, 10, 12, 14, 15, 16, 17, 18], "improv": 8, "inappropri": 1, "incid": 1, "includ": [1, 6, 16, 17], "inclus": 1, "increas": 9, "independ": 9, "index": [2, 7], "indic": 10, "individu": 1, "infer": [4, 8, 9, 15, 18], "inform": [1, 2, 4, 6, 16], "input": [2, 7, 8, 9, 17, 18], "input_crop": 8, "input_pag": [8, 10, 18], "input_shap": 17, "input_tensor": 8, "inspir": [1, 9], "instal": [14, 15, 17], "instanc": [1, 18], "instanti": [8, 18], "instead": [6, 7, 8], "insult": 1, "int": [6, 7, 9], "int64": 10, "integ": 10, "integr": [4, 14, 16], "intel": 18, "interact": [1, 7, 10], "interfac": [14, 17], "interoper": 17, "interpol": 9, "interpret": [6, 7], "intersect": 10, "invert": 9, "investig": 1, "invis": 1, "involv": [1, 18], "io": [12, 14, 15, 17], "iou": 10, "iou_thresh": 10, "iou_threshold": 15, "irregular": [4, 8, 16], "isn": 6, "issu": [1, 2, 14], "italian": 6, "iter": [6, 9, 16, 18], "its": [7, 8, 9, 10, 16, 18], "itself": [8, 14], "j": 10, "job": 2, "join": 2, "jpeg": 9, "jpegqual": 9, "jpg": [6, 7, 14, 17], "json": [6, 16, 18], "json_output": 18, "jump": 2, "just": 1, "kei": [4, 6], "kera": [8, 17], "kernel": [4, 8, 9], "kernel_shap": 9, "keywoard": 8, "keyword": [6, 7, 8, 10], "kie": [8, 12], "kie_predictor": [8, 12], "kiepredictor": 8, "kind": 1, "know": [2, 17], "kwarg": [6, 7, 8, 10], "l": 10, "l_j": 10, "label": [6, 10, 15, 16], "label_fil": [6, 16], "label_fold": 6, "label_path": [6, 16], "labels_path": [6, 16], "ladder": 1, "lambda": 9, "lambdatransform": 9, "lang": 18, "languag": [1, 4, 6, 7, 8, 14, 18], "larg": [8, 14], "largest": 10, "last": [3, 6], "latenc": 8, "later": 2, "latest": 18, "latin": 6, "layer": 17, "layout": 18, "lead": 1, "leader": 1, "learn": [1, 4, 8, 17, 18], "least": 3, "left": [10, 18], "legacy_french": 6, "length": [6, 18], "less": [17, 18], "level": [1, 6, 10, 18], "leverag": 11, "lf": 14, "librari": [2, 3, 11, 12], "light": 4, "lightweight": 17, "like": 1, "limits_": 10, "line": [4, 8, 10, 18], "line_1_1": 18, "link": 12, "linknet": [4, 8], "linknet_resnet18": [8, 12, 17, 18], "linknet_resnet34": [8, 17, 18], "linknet_resnet50": [8, 18], "list": [6, 7, 9, 10, 14], "ll": 10, "load": [4, 6, 8, 15, 17], "load_state_dict": 12, "load_weight": 12, "loc_pr": 18, "local": [2, 4, 6, 8, 10, 16, 18], "localis": 6, "localizationconfus": 10, "locat": [2, 7, 18], "login": 8, "login_to_hub": [8, 14], "logo": [7, 15, 16], "love": 14, "lower": [9, 10, 18], "m": [2, 10, 18], "m1": 3, "macbook": 3, "machin": 17, "made": 4, "magc_resnet31": 8, "mai": [1, 2], "mail": 1, "main": 11, "maintain": 4, "mainten": 2, "make": [1, 2, 10, 12, 13, 14, 17, 18], "mani": [16, 18], "manipul": 18, "map": [6, 8], "map_loc": 12, "master": [4, 8, 18], "match": [10, 18], "mathcal": 10, "matplotlib": [7, 10], "max": [6, 9, 10], "max_angl": 9, "max_area": 9, "max_char": [6, 16], "max_delta": 9, "max_gain": 9, "max_gamma": 9, "max_qual": 9, "max_ratio": 9, "maximum": [6, 9], "maxval": [8, 9], "mbox": 10, "mean": [9, 10, 12], "meaniou": 10, "meant": [7, 17], "measur": 18, "media": 1, "median": 8, "meet": 12, "member": 1, "memori": [13, 17], "mention": 18, "merg": 6, "messag": 2, "meta": 18, "metadata": 17, "metal": 3, "method": [7, 9, 18], "metric": [10, 18], "middl": 18, "might": [17, 18], "min": 9, "min_area": 9, "min_char": [6, 16], "min_gain": 9, "min_gamma": 9, "min_qual": 9, "min_ratio": 9, "min_val": 9, "minde": [1, 3, 4, 8], "minim": [2, 4], "minimalist": [4, 8], "minimum": [3, 6, 9, 10, 18], "minval": 9, "miss": 3, "mistak": 1, "mixed_float16": 17, "mixed_precis": 17, "mjsynth": [4, 6, 16], "mnt": 6, "mobilenet": [8, 14], "mobilenet_v3_larg": 8, "mobilenet_v3_large_r": 8, "mobilenet_v3_smal": [8, 12], "mobilenet_v3_small_crop_orient": [8, 12], "mobilenet_v3_small_page_orient": [8, 12], "mobilenet_v3_small_r": 8, "mobilenetv3": 8, "modal": [4, 6], "mode": 3, "model": [6, 10, 13, 15, 16], "model_nam": [8, 14, 17], "model_path": [15, 17], "moder": 1, "modif": 2, "modifi": [8, 13, 18], "modul": [3, 7, 8, 9, 10, 18], "more": [2, 16, 18], "most": 18, "mozilla": 1, "multi": [4, 8], "multilingu": [6, 14], "multipl": [6, 7, 9, 18], "multipli": 9, "multiprocess": 13, "my": 8, "my_awesome_model": 14, "my_hook": 18, "n": [6, 10], "name": [6, 8, 17, 18], "nation": 1, "natur": [1, 4, 6], "ndarrai": [6, 7, 9, 10], "necessari": [3, 12, 13], "need": [2, 3, 6, 10, 12, 13, 14, 15, 18], "neg": 9, "nest": 18, "network": [4, 6, 8, 17], "neural": [4, 6, 8, 17], "new": [2, 10], "next": [6, 16], "nois": 9, "noisi": [4, 6], "non": [4, 6, 7, 8, 9, 10], "none": [6, 7, 8, 9, 10, 18], "normal": [8, 9], "norwegian": 6, "note": [0, 2, 6, 8, 12, 14, 15, 17], "now": 2, "np": [8, 9, 10, 18], "num_output_channel": 9, "num_sampl": [6, 16], "number": [6, 9, 10, 18], "numpi": [7, 8, 10, 18], "o": 3, "obb": 15, "obj_detect": 14, "object": [6, 7, 10, 15, 18], "objectness_scor": [7, 18], "oblig": 1, "obtain": 18, "occupi": 17, "ocr": [4, 6, 8, 10, 14], "ocr_carea": 18, "ocr_db_crnn": 10, "ocr_lin": 18, "ocr_pag": 18, "ocr_par": 18, "ocr_predictor": [8, 12, 14, 17, 18], "ocrdataset": [6, 16], "ocrmetr": 10, "ocrpredictor": [8, 12], "ocrx_word": 18, "offens": 1, "offici": [1, 8], "offlin": 1, "offset": 9, "onc": 18, "one": [2, 6, 8, 9, 12, 14, 18], "oneof": 9, "ones": [6, 10], "onli": [2, 8, 9, 10, 12, 14, 16, 17, 18], "onlin": 1, "onnx": 15, "onnxruntim": [15, 17], "onnxtr": 17, "opac": 9, "opacity_rang": 9, "open": [1, 2, 14, 17], "opinion": 1, "optic": [4, 18], "optim": [4, 18], "option": [6, 8, 12], "order": [2, 6, 7, 9], "org": [1, 6, 8, 18], "organ": 7, "orient": [1, 7, 8, 11, 15, 18], "orientationpredictor": 8, "other": [1, 2], "otherwis": [1, 7, 10], "our": [2, 8, 18], "out": [2, 8, 9, 10, 18], "outpout": 18, "output": [7, 9, 17], "output_s": [7, 9], "outsid": 13, "over": [6, 10, 18], "overal": [1, 8], "overlai": 7, "overview": 15, "overwrit": 12, "overwritten": 14, "own": 4, "p": [9, 18], "packag": [2, 4, 10, 13, 15, 16, 17], "pad": [6, 8, 9, 18], "page": [3, 6, 8, 10, 12, 18], "page1": 7, "page2": 7, "page_1": 18, "page_idx": [7, 18], "page_orientation_predictor": [8, 12], "page_param": 12, "pair": 10, "paper": 8, "par_1_1": 18, "paragraph": 18, "paragraph_break": 18, "param": [9, 18], "paramet": [4, 7, 8, 17], "pars": [4, 6], "parseq": [4, 8, 14, 17, 18], "part": [6, 9, 18], "parti": 3, "partial": 18, "particip": 1, "pass": [6, 7, 8, 12, 18], "password": 7, "patch": [8, 10], "path": [6, 7, 15, 16, 17], "path_to_checkpoint": 12, "path_to_custom_model": 17, "path_to_pt": 12, "pattern": 1, "pdf": [7, 8, 11], "pdfpage": 7, "peopl": 1, "per": [9, 18], "perform": [4, 7, 8, 9, 10, 13, 17, 18], "period": 1, "permiss": 1, "permut": [4, 8], "persian_lett": 6, "person": [1, 16], "phase": 18, "photo": 16, "physic": [1, 7], "pick": 9, "pictur": 7, "pip": [2, 3, 15, 17], "pipelin": 18, "pixel": [7, 9, 18], "pleas": 2, "plot": 10, "plt": 10, "plug": 14, "plugin": 3, "png": 7, "point": 17, "polici": 13, "polish": 6, "polit": 1, "polygon": [6, 10, 18], "pool": 8, "portugues": 6, "posit": [1, 10], "possibl": [2, 10, 14, 18], "post": [1, 18], "postprocessor": 18, "potenti": 8, "power": 4, "ppageno": 18, "pre": [2, 8, 17], "precis": [10, 18], "pred": 10, "pred_box": 10, "pred_label": 10, "predefin": 16, "predict": [7, 8, 10, 18], "predictor": [4, 7, 8, 11, 12, 14, 17], "prefer": 16, "preinstal": 3, "preprocessor": [12, 18], "prerequisit": 14, "present": 11, "preserv": [8, 9, 18], "preserve_aspect_ratio": [7, 8, 9, 12, 18], "pretrain": [4, 8, 10, 12, 17, 18], "pretrained_backbon": [8, 12], "print": 18, "prior": 6, "privaci": 1, "privat": 1, "probabl": 9, "problem": 2, "procedur": 9, "process": [2, 4, 7, 12, 18], "processor": 18, "produc": [11, 18], "product": 17, "profession": 1, "project": [2, 16], "promptli": 1, "proper": 2, "properli": 6, "provid": [1, 2, 4, 14, 15, 16, 18], "public": [1, 4], "publicli": 18, "publish": 1, "pull": 14, "punctuat": 6, "pure": 6, "purpos": 2, "push_to_hf_hub": [8, 14], "py": 14, "pypdfium2": [3, 7], "pyplot": [7, 10], "python": [2, 15], "python3": 14, "pytorch": [3, 4, 8, 9, 12, 14, 17, 18], "q": 2, "qr": [7, 15], "qr_code": 16, "qualiti": 9, "question": 1, "quickli": 4, "quicktour": 11, "r": 18, "race": 1, "ramdisk": 6, "rand": [8, 9, 10, 17, 18], "random": [8, 9, 10, 18], "randomappli": 9, "randombright": 9, "randomcontrast": 9, "randomcrop": 9, "randomgamma": 9, "randomhorizontalflip": 9, "randomhu": 9, "randomjpegqu": 9, "randomli": 9, "randomres": 9, "randomrot": 9, "randomsatur": 9, "randomshadow": 9, "rang": 9, "rassi": 14, "ratio": [8, 9, 18], "raw": [7, 10], "re": 17, "read": [4, 6, 8], "read_html": 7, "read_img_as_numpi": 7, "read_img_as_tensor": 7, "read_pdf": 7, "readi": 17, "real": [4, 8, 9], "reason": [1, 4, 6], "rebuild": 2, "rebuilt": 2, "recal": [10, 18], "receipt": [4, 6, 18], "reco_arch": [8, 12, 14, 17], "reco_b": 18, "reco_model": [12, 14, 17], "reco_param": 12, "reco_predictor": 12, "recogn": 18, "recognit": [6, 10, 11, 12], "recognition_predictor": [8, 18], "recognition_task": [6, 16], "recognitiondataset": [6, 16], "recognitionpredictor": [8, 12], "rectangular": 8, "reduc": [3, 9], "refer": [2, 3, 12, 14, 15, 16, 18], "regardless": 1, "region": 18, "regroup": 10, "regular": 16, "reject": 1, "rel": [7, 9, 10, 18], "relat": 7, "releas": [0, 3], "relev": 15, "religion": 1, "remov": 1, "render": [7, 18], "repo": 8, "repo_id": [8, 14], "report": 1, "repositori": [6, 8, 14], "repres": [1, 17, 18], "represent": [4, 8], "request": [1, 14], "requir": [3, 9, 17], "research": 4, "residu": 8, "resiz": [9, 18], "resnet": 8, "resnet18": [8, 14], "resnet31": 8, "resnet34": 8, "resnet50": [8, 14], "resolv": 7, "resolve_block": 18, "resolve_lin": 18, "resourc": 16, "respect": 1, "rest": [2, 9, 10], "restrict": 13, "result": [2, 6, 7, 11, 14, 17, 18], "return": 18, "reusabl": 18, "review": 1, "rgb": [7, 9], "rgb_mode": 7, "rgb_output": 7, "right": [1, 8, 10], "robust": [4, 6], "root": 6, "rotat": [6, 7, 8, 9, 10, 11, 12, 16, 18], "run": [2, 3, 8], "same": [2, 7, 10, 16, 17, 18], "sampl": [6, 16, 18], "sample_transform": 6, "sar": [4, 8], "sar_resnet31": [8, 18], "satur": 9, "save": [8, 16], "scale": [7, 8, 9, 10], "scale_rang": 9, "scan": [4, 6], "scene": [4, 6, 8], "score": [7, 10], "script": [2, 16], "seamless": 4, "seamlessli": [4, 18], "search": 8, "searchabl": 11, "sec": 18, "second": 18, "section": [12, 14, 15, 17, 18], "secur": [1, 13], "see": [1, 2], "seen": 18, "segment": [4, 8, 18], "self": 18, "semant": [4, 8], "send": 18, "sens": 10, "sensit": 16, "separ": 18, "sequenc": [4, 6, 7, 8, 10, 18], "sequenti": [9, 18], "seri": 1, "seriou": 1, "set": [1, 3, 6, 8, 10, 13, 15, 18], "set_global_polici": 17, "sever": [7, 9, 18], "sex": 1, "sexual": 1, "shade": 9, "shape": [4, 7, 8, 9, 10, 18], "share": [13, 16], "shift": 9, "shm": 13, "should": [2, 6, 7, 9, 10], "show": [4, 7, 8, 10, 12, 14, 15], "showcas": [2, 11], "shuffl": [6, 9], "side": 10, "signatur": 7, "signific": 16, "simpl": [4, 8, 17], "simpler": 8, "sinc": [6, 16], "singl": [1, 2, 4, 6], "single_img_doc": 17, "size": [1, 6, 7, 9, 15, 18], "skew": 18, "slack": 2, "slightli": 8, "small": [2, 8, 18], "smallest": 7, "snapshot_download": 8, "snippet": 18, "so": [2, 3, 6, 8, 14, 16], "social": 1, "socio": 1, "some": [3, 11, 14, 16], "someth": 2, "somewher": 2, "sort": 1, "sourc": [6, 7, 8, 9, 10, 14], "space": [1, 18], "span": 18, "spanish": 6, "spatial": [4, 6, 7], "specif": [2, 3, 10, 12, 16, 18], "specifi": [1, 6, 7], "speed": [4, 8, 18], "sphinx": 2, "sroie": [4, 6, 16], "stabl": 3, "stackoverflow": 2, "stage": 4, "standalon": 11, "standard": 9, "start": 6, "state": [4, 10, 15], "static": 10, "statu": 1, "std": [9, 12], "step": 13, "still": 18, "str": [6, 7, 8, 9, 10], "straight": [6, 8, 16, 18], "straighten": 18, "straighten_pag": [8, 12, 18], "straigten_pag": 12, "stream": 7, "street": [4, 6], "strict": 3, "strictli": 10, "string": [6, 7, 10, 18], "strive": 3, "strong": [4, 8], "structur": [17, 18], "subset": [6, 18], "suggest": [2, 14], "sum": 10, "summari": 10, "support": [3, 12, 15, 17, 18], "sustain": 1, "svhn": [4, 6, 16], "svt": [6, 16], "swedish": 6, "symmetr": [8, 9, 18], "symmetric_pad": [8, 9, 18], "synthet": 4, "synthtext": [4, 6, 16], "system": 18, "t": [2, 6, 12, 17, 18], "tabl": [14, 15, 16], "take": [1, 6, 18], "target": [6, 7, 9, 10, 16], "target_s": 6, "task": [4, 6, 8, 14, 16, 18], "task2": 6, "team": 3, "techminde": 3, "templat": [2, 4], "tensor": [6, 7, 9, 18], "tensorflow": [3, 4, 7, 8, 9, 12, 14, 17, 18], "tensorspec": 17, "term": 1, "test": [6, 16], "test_set": 6, "text": [6, 7, 8, 10, 16], "text_output": 18, "textmatch": 10, "textnet": 8, "textnet_bas": 8, "textnet_smal": 8, "textnet_tini": 8, "textract": [4, 18], "textstylebrush": [4, 6], "textual": [4, 6, 7, 8, 18], "tf": [3, 7, 8, 9, 14, 17], "than": [2, 10, 14], "thank": 2, "thei": [1, 10], "them": [6, 18], "thi": [1, 2, 3, 5, 6, 9, 10, 12, 13, 14, 16, 17, 18], "thing": [17, 18], "third": 3, "those": [1, 7, 18], "threaten": 1, "threshold": 18, "through": [1, 9, 15, 16], "tilman": 14, "time": [1, 4, 8, 10, 16], "tini": 8, "titl": [7, 18], "tm": 18, "tmp": 13, "togeth": [2, 7], "tograi": 9, "tool": 16, "top": [10, 17, 18], "topic": 2, "torch": [3, 9, 12, 14, 17], "torchvis": 9, "total": 12, "toward": [1, 3], "train": [2, 6, 8, 9, 14, 15, 16, 17, 18], "train_it": [6, 16], "train_load": [6, 16], "train_pytorch": 14, "train_set": [6, 16], "train_tensorflow": 14, "trainabl": [4, 8], "tranform": 9, "transcrib": 18, "transfer": [4, 6], "transfo": 9, "transform": [4, 6, 8], "translat": 1, "troll": 1, "true": [6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18], "truth": 10, "tune": 17, "tupl": [6, 7, 9, 10], "two": [7, 13], "txt": 6, "type": [7, 10, 14, 17, 18], "typic": 18, "u": [1, 2], "ucsd": 6, "udac": 2, "uint8": [7, 8, 10, 18], "ukrainian": 6, "unaccept": 1, "underli": [16, 18], "underneath": 7, "understand": [4, 6, 18], "uniform": [8, 9], "uniformli": 9, "uninterrupt": [7, 18], "union": 10, "unittest": 2, "unlock": 7, "unoffici": 8, "unprofession": 1, "unsolicit": 1, "unsupervis": 4, "unwelcom": 1, "up": [8, 18], "updat": 10, "upgrad": 2, "upper": [6, 9], "uppercas": 16, "url": 7, "us": [1, 2, 3, 6, 8, 10, 11, 12, 13, 14, 15, 18], "usabl": 18, "usag": [13, 17], "use_polygon": [6, 10, 16], "useabl": 18, "user": [4, 7, 11], "utf": 18, "util": 17, "v1": 14, "v3": [8, 14, 18], "valid": 16, "valu": [2, 7, 9, 18], "valuabl": 4, "variabl": 13, "varieti": 6, "veri": 8, "version": [1, 2, 3, 17, 18], "vgg": 8, "vgg16": 14, "vgg16_bn_r": 8, "via": 1, "vietnames": 6, "view": [4, 6], "viewpoint": 1, "violat": 1, "visibl": 1, "vision": [4, 6, 8], "visiondataset": 6, "visiontransform": 8, "visual": [3, 4, 15], "visualize_pag": 10, "vit_": 8, "vit_b": 8, "vitstr": [4, 8, 17], "vitstr_bas": [8, 18], "vitstr_smal": [8, 12, 17, 18], "viz": 3, "vocab": [12, 14, 16, 17, 18], "vocabulari": [6, 12, 14], "w": [7, 8, 9, 10], "w3": 18, "wa": 1, "wai": [1, 4, 16], "want": [2, 17, 18], "warmup": 18, "wasn": 2, "we": [1, 2, 3, 4, 7, 9, 12, 14, 16, 17, 18], "weasyprint": 7, "web": [2, 7], "websit": 6, "welcom": 1, "well": [1, 17], "were": [1, 7, 18], "what": 1, "when": [1, 2, 8], "whenev": 2, "where": [2, 7, 9, 10], "whether": [2, 6, 7, 9, 10, 16, 18], "which": [1, 8, 13, 15, 16, 18], "whichev": 3, "while": [9, 18], "why": 1, "width": [7, 9], "wiki": 1, "wildreceipt": [4, 6, 16], "window": [8, 10], "wish": 2, "within": 1, "without": [1, 6, 8], "wonder": 2, "word": [4, 6, 8, 10, 18], "word_1_1": 18, "word_1_2": 18, "word_1_3": 18, "wordgener": [6, 16], "words_onli": 10, "work": [12, 13, 18], "workflow": 2, "worklow": 2, "world": [10, 18], "worth": 8, "wrap": 18, "wrapper": [6, 9], "write": 13, "written": [1, 7], "www": [1, 7, 18], "x": [7, 9, 10], "x_ascend": 18, "x_descend": 18, "x_i": 10, "x_size": 18, "x_wconf": 18, "xhtml": 18, "xmax": 7, "xmin": 7, "xml": 18, "xml_bytes_str": 18, "xml_element": 18, "xml_output": 18, "xmln": 18, "y": 10, "y_i": 10, "y_j": 10, "yet": 15, "ymax": 7, "ymin": 7, "yolov8": 15, "you": [2, 3, 6, 7, 8, 12, 13, 14, 15, 16, 17, 18], "your": [2, 4, 7, 10, 18], "yoursit": 7, "zero": [9, 10], "zoo": 12, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 6, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 6, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 6, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 6, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 6, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 6, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 6, "\u00e4\u00f6\u00e4\u00f6": 6, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 6, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 6, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 6, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 6, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 6, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 6, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0905": 6, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 6, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 6, "\u0950": 6, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 6, "\u09bd": 6, "\u09ce": 6, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 6}, "titles": ["Changelog", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 2, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 1], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 1], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 1], "31": 0, "4": [0, 1], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 18, "approach": 18, "architectur": 18, "arg": [6, 7, 8, 9, 10], "artefact": 7, "artefactdetect": 15, "attribut": 1, "avail": [15, 16, 18], "aw": 13, "ban": 1, "block": 7, "bug": 2, "changelog": 0, "choos": [16, 18], "classif": [8, 12, 14], "code": [1, 2], "codebas": 2, "commit": 2, "commun": 14, "compos": 9, "conda": 3, "conduct": 1, "connect": 2, "continu": 2, "contrib": 5, "contribut": [2, 5, 15], "contributor": 1, "convent": 14, "correct": 1, "coven": 1, "custom": [6, 12], "data": 16, "dataload": 6, "dataset": [4, 6, 16], "detect": [4, 8, 14, 16, 18], "develop": 2, "do": 18, "doctr": [2, 4, 5, 6, 7, 8, 9, 10, 11], "document": [2, 4, 7], "end": 18, "enforc": 1, "evalu": 10, "export": 17, "factori": 8, "featur": [2, 4], "feedback": 2, "file": 7, "from": 14, "gener": [6, 16], "git": 3, "guidelin": 1, "half": 17, "hub": 14, "huggingfac": 14, "i": 18, "infer": 17, "instal": [2, 3], "integr": [2, 15], "io": 7, "lambda": 13, "let": 2, "line": 7, "linux": 3, "load": [12, 14, 16], "loader": 6, "main": 4, "mode": 2, "model": [4, 8, 12, 14, 17, 18], "modifi": 2, "modul": [5, 15], "name": 14, "notebook": 11, "object": 16, "ocr": [16, 18], "onli": 3, "onnx": 17, "optim": 17, "option": 18, "orient": 12, "our": 1, "output": 18, "own": [12, 16], "packag": 3, "page": 7, "perman": 1, "pipelin": 15, "pledg": 1, "precis": 17, "predictor": 18, "prepar": 17, "prerequisit": 3, "pretrain": 14, "push": 14, "python": 3, "qualiti": 2, "question": 2, "read": 7, "readi": 16, "recognit": [4, 8, 14, 16, 18], "report": 2, "request": 2, "respons": 1, "return": [6, 7, 8, 10], "right": 18, "scope": 1, "share": 14, "should": 18, "stage": 18, "standard": 1, "structur": [2, 7], "style": 2, "support": [4, 5, 6, 9], "synthet": [6, 16], "task": 10, "temporari": 1, "test": 2, "text": [4, 18], "train": 12, "transform": 9, "two": 18, "unit": 2, "us": [16, 17], "util": 10, "v0": 0, "verif": 2, "via": 3, "visual": 10, "vocab": 6, "warn": 1, "what": 18, "word": 7, "your": [12, 14, 15, 16, 17], "zoo": [4, 8]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[1, "correction"]], "2. Warning": [[1, "warning"]], "3. Temporary Ban": [[1, "temporary-ban"]], "4. Permanent Ban": [[1, "permanent-ban"]], "AWS Lambda": [[13, null]], "Advanced options": [[18, "advanced-options"]], "Args:": [[6, "args"], [6, "id4"], [6, "id7"], [6, "id10"], [6, "id13"], [6, "id16"], [6, "id19"], [6, "id22"], [6, "id25"], [6, "id29"], [6, "id32"], [6, "id37"], [6, "id40"], [6, "id46"], [6, "id49"], [6, "id50"], [6, "id51"], [6, "id54"], [6, "id57"], [6, "id60"], [6, "id61"], [7, "args"], [7, "id2"], [7, "id3"], [7, "id4"], [7, "id5"], [7, "id6"], [7, "id7"], [7, "id10"], [7, "id12"], [7, "id14"], [7, "id16"], [7, "id20"], [7, "id24"], [7, "id28"], [8, "args"], [8, "id3"], [8, "id8"], [8, "id13"], [8, "id17"], [8, "id21"], [8, "id26"], [8, "id31"], [8, "id36"], [8, "id41"], [8, "id46"], [8, "id50"], [8, "id54"], [8, "id59"], [8, "id63"], [8, "id68"], [8, "id73"], [8, "id77"], [8, "id81"], [8, "id85"], [8, "id90"], [8, "id95"], [8, "id99"], [8, "id104"], [8, "id109"], [8, "id114"], [8, "id119"], [8, "id123"], [8, "id127"], [8, "id132"], [8, "id137"], [8, "id142"], [8, "id146"], [8, "id150"], [8, "id155"], [8, "id159"], [8, "id163"], [8, "id167"], [8, "id169"], [8, "id171"], [8, "id173"], [9, "args"], [9, "id1"], [9, "id2"], [9, "id3"], [9, "id4"], [9, "id5"], [9, "id6"], [9, "id7"], [9, "id8"], [9, "id9"], [9, "id10"], [9, "id11"], [9, "id12"], [9, "id13"], [9, "id14"], [9, "id15"], [9, "id16"], [9, "id17"], [9, "id18"], [9, "id19"], [10, "args"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"]], "Artefact": [[7, "artefact"]], "ArtefactDetection": [[15, "artefactdetection"]], "Attribution": [[1, "attribution"]], "Available Datasets": [[16, "available-datasets"]], "Available architectures": [[18, "available-architectures"], [18, "id1"], [18, "id2"]], "Available contribution modules": [[15, "available-contribution-modules"]], "Block": [[7, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[16, null]], "Choosing the right model": [[18, null]], "Classification": [[14, "classification"]], "Code quality": [[2, "code-quality"]], "Code style verification": [[2, "code-style-verification"]], "Codebase structure": [[2, "codebase-structure"]], "Commits": [[2, "commits"]], "Composing transformations": [[9, "composing-transformations"]], "Continuous Integration": [[2, "continuous-integration"]], "Contributing to docTR": [[2, null]], "Contributor Covenant Code of Conduct": [[1, null]], "Custom dataset loader": [[6, "custom-dataset-loader"]], "Custom orientation classification models": [[12, "custom-orientation-classification-models"]], "Data Loading": [[16, "data-loading"]], "Dataloader": [[6, "dataloader"]], "Detection": [[14, "detection"], [16, "detection"]], "Detection predictors": [[18, "detection-predictors"]], "Developer mode installation": [[2, "developer-mode-installation"]], "Developing docTR": [[2, "developing-doctr"]], "Document": [[7, "document"]], "Document structure": [[7, "document-structure"]], "End-to-End OCR": [[18, "end-to-end-ocr"]], "Enforcement": [[1, "enforcement"]], "Enforcement Guidelines": [[1, "enforcement-guidelines"]], "Enforcement Responsibilities": [[1, "enforcement-responsibilities"]], "Export to ONNX": [[17, "export-to-onnx"]], "Feature requests & bug report": [[2, "feature-requests-bug-report"]], "Feedback": [[2, "feedback"]], "File reading": [[7, "file-reading"]], "Half-precision": [[17, "half-precision"]], "Installation": [[3, null]], "Integrate contributions into your pipeline": [[15, null]], "Let\u2019s connect": [[2, "let-s-connect"]], "Line": [[7, "line"]], "Loading from Huggingface Hub": [[14, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[12, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[12, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[4, "main-features"]], "Model optimization": [[17, "model-optimization"]], "Model zoo": [[4, "model-zoo"]], "Modifying the documentation": [[2, "modifying-the-documentation"]], "Naming conventions": [[14, "naming-conventions"]], "OCR": [[16, "ocr"]], "Object Detection": [[16, "object-detection"]], "Our Pledge": [[1, "our-pledge"]], "Our Standards": [[1, "our-standards"]], "Page": [[7, "page"]], "Preparing your model for inference": [[17, null]], "Prerequisites": [[3, "prerequisites"]], "Pretrained community models": [[14, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[14, "pushing-to-the-huggingface-hub"]], "Questions": [[2, "questions"]], "Recognition": [[14, "recognition"], [16, "recognition"]], "Recognition predictors": [[18, "recognition-predictors"]], "Returns:": [[6, "returns"], [7, "returns"], [7, "id11"], [7, "id13"], [7, "id15"], [7, "id19"], [7, "id23"], [7, "id27"], [7, "id31"], [8, "returns"], [8, "id6"], [8, "id11"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id29"], [8, "id34"], [8, "id39"], [8, "id44"], [8, "id49"], [8, "id53"], [8, "id57"], [8, "id62"], [8, "id66"], [8, "id71"], [8, "id76"], [8, "id80"], [8, "id84"], [8, "id88"], [8, "id93"], [8, "id98"], [8, "id102"], [8, "id107"], [8, "id112"], [8, "id117"], [8, "id122"], [8, "id126"], [8, "id130"], [8, "id135"], [8, "id140"], [8, "id145"], [8, "id149"], [8, "id153"], [8, "id158"], [8, "id162"], [8, "id166"], [8, "id168"], [8, "id170"], [8, "id172"], [10, "returns"]], "Scope": [[1, "scope"]], "Share your model with the community": [[14, null]], "Supported Vocabs": [[6, "supported-vocabs"]], "Supported contribution modules": [[5, "supported-contribution-modules"]], "Supported datasets": [[4, "supported-datasets"]], "Supported transformations": [[9, "supported-transformations"]], "Synthetic dataset generator": [[6, "synthetic-dataset-generator"], [16, "synthetic-dataset-generator"]], "Task evaluation": [[10, "task-evaluation"]], "Text Detection": [[18, "text-detection"]], "Text Recognition": [[18, "text-recognition"]], "Text detection models": [[4, "text-detection-models"]], "Text recognition models": [[4, "text-recognition-models"]], "Train your own model": [[12, null]], "Two-stage approaches": [[18, "two-stage-approaches"]], "Unit tests": [[2, "unit-tests"]], "Use your own datasets": [[16, "use-your-own-datasets"]], "Using your ONNX exported model": [[17, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[3, "via-conda-only-for-linux"]], "Via Git": [[3, "via-git"]], "Via Python Package": [[3, "via-python-package"]], "Visualization": [[10, "visualization"]], "What should I do with the output?": [[18, "what-should-i-do-with-the-output"]], "Word": [[7, "word"]], "docTR Notebooks": [[11, null]], "docTR Vocabs": [[6, "id62"]], "docTR: Document Text Recognition": [[4, null]], "doctr.contrib": [[5, null]], "doctr.datasets": [[6, null], [6, "datasets"]], "doctr.io": [[7, null]], "doctr.models": [[8, null]], "doctr.models.classification": [[8, "doctr-models-classification"]], "doctr.models.detection": [[8, "doctr-models-detection"]], "doctr.models.factory": [[8, "doctr-models-factory"]], "doctr.models.recognition": [[8, "doctr-models-recognition"]], "doctr.models.zoo": [[8, "doctr-models-zoo"]], "doctr.transforms": [[9, null]], "doctr.utils": [[10, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[7, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[7, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[9, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[6, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[9, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[9, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[6, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[6, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[8, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[6, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[6, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[7, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[7, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[6, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[6, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[9, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[9, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[6, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[6, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[6, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[6, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[6, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[8, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[9, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[7, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[6, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[9, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[8, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[6, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[9, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[7, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[9, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[9, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[9, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[9, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[9, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[9, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[9, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[9, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[9, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[9, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[9, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[9, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[7, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[7, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[7, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[6, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[9, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[7, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[7, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[6, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[6, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[6, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[6, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[9, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[10, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[6, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[7, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[6, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[6, 0, 1, "", "CORD"], [6, 0, 1, "", "CharacterGenerator"], [6, 0, 1, "", "DetectionDataset"], [6, 0, 1, "", "DocArtefacts"], [6, 0, 1, "", "FUNSD"], [6, 0, 1, "", "IC03"], [6, 0, 1, "", "IC13"], [6, 0, 1, "", "IIIT5K"], [6, 0, 1, "", "IIITHWS"], [6, 0, 1, "", "IMGUR5K"], [6, 0, 1, "", "MJSynth"], [6, 0, 1, "", "OCRDataset"], [6, 0, 1, "", "RecognitionDataset"], [6, 0, 1, "", "SROIE"], [6, 0, 1, "", "SVHN"], [6, 0, 1, "", "SVT"], [6, 0, 1, "", "SynthText"], [6, 0, 1, "", "WILDRECEIPT"], [6, 0, 1, "", "WordGenerator"], [6, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[6, 0, 1, "", "DataLoader"]], "doctr.io": [[7, 0, 1, "", "Artefact"], [7, 0, 1, "", "Block"], [7, 0, 1, "", "Document"], [7, 0, 1, "", "DocumentFile"], [7, 0, 1, "", "Line"], [7, 0, 1, "", "Page"], [7, 0, 1, "", "Word"], [7, 1, 1, "", "decode_img_as_tensor"], [7, 1, 1, "", "read_html"], [7, 1, 1, "", "read_img_as_numpy"], [7, 1, 1, "", "read_img_as_tensor"], [7, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[7, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[7, 2, 1, "", "from_images"], [7, 2, 1, "", "from_pdf"], [7, 2, 1, "", "from_url"]], "doctr.io.Page": [[7, 2, 1, "", "show"]], "doctr.models": [[8, 1, 1, "", "kie_predictor"], [8, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[8, 1, 1, "", "crop_orientation_predictor"], [8, 1, 1, "", "magc_resnet31"], [8, 1, 1, "", "mobilenet_v3_large"], [8, 1, 1, "", "mobilenet_v3_large_r"], [8, 1, 1, "", "mobilenet_v3_small"], [8, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [8, 1, 1, "", "mobilenet_v3_small_page_orientation"], [8, 1, 1, "", "mobilenet_v3_small_r"], [8, 1, 1, "", "page_orientation_predictor"], [8, 1, 1, "", "resnet18"], [8, 1, 1, "", "resnet31"], [8, 1, 1, "", "resnet34"], [8, 1, 1, "", "resnet50"], [8, 1, 1, "", "textnet_base"], [8, 1, 1, "", "textnet_small"], [8, 1, 1, "", "textnet_tiny"], [8, 1, 1, "", "vgg16_bn_r"], [8, 1, 1, "", "vit_b"], [8, 1, 1, "", "vit_s"]], "doctr.models.detection": [[8, 1, 1, "", "db_mobilenet_v3_large"], [8, 1, 1, "", "db_resnet50"], [8, 1, 1, "", "detection_predictor"], [8, 1, 1, "", "fast_base"], [8, 1, 1, "", "fast_small"], [8, 1, 1, "", "fast_tiny"], [8, 1, 1, "", "linknet_resnet18"], [8, 1, 1, "", "linknet_resnet34"], [8, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[8, 1, 1, "", "from_hub"], [8, 1, 1, "", "login_to_hub"], [8, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[8, 1, 1, "", "crnn_mobilenet_v3_large"], [8, 1, 1, "", "crnn_mobilenet_v3_small"], [8, 1, 1, "", "crnn_vgg16_bn"], [8, 1, 1, "", "master"], [8, 1, 1, "", "parseq"], [8, 1, 1, "", "recognition_predictor"], [8, 1, 1, "", "sar_resnet31"], [8, 1, 1, "", "vitstr_base"], [8, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[9, 0, 1, "", "ChannelShuffle"], [9, 0, 1, "", "ColorInversion"], [9, 0, 1, "", "Compose"], [9, 0, 1, "", "GaussianBlur"], [9, 0, 1, "", "GaussianNoise"], [9, 0, 1, "", "LambdaTransformation"], [9, 0, 1, "", "Normalize"], [9, 0, 1, "", "OneOf"], [9, 0, 1, "", "RandomApply"], [9, 0, 1, "", "RandomBrightness"], [9, 0, 1, "", "RandomContrast"], [9, 0, 1, "", "RandomCrop"], [9, 0, 1, "", "RandomGamma"], [9, 0, 1, "", "RandomHorizontalFlip"], [9, 0, 1, "", "RandomHue"], [9, 0, 1, "", "RandomJpegQuality"], [9, 0, 1, "", "RandomResize"], [9, 0, 1, "", "RandomRotate"], [9, 0, 1, "", "RandomSaturation"], [9, 0, 1, "", "RandomShadow"], [9, 0, 1, "", "Resize"], [9, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[10, 0, 1, "", "DetectionMetric"], [10, 0, 1, "", "LocalizationConfusion"], [10, 0, 1, "", "OCRMetric"], [10, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.visualization": [[10, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [1, 7, 8, 10, 14, 17], "0": [1, 3, 6, 9, 10, 12, 15, 16, 18], "00": 18, "01": 18, "0123456789": 6, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "02562": 8, "03": 18, "035": 18, "0361328125": 18, "04": 18, "05": 18, "06": 18, "06640625": 18, "07": 18, "08": [9, 18], "09": 18, "0966796875": 18, "1": [6, 7, 8, 9, 10, 12, 16, 18], "10": [6, 10, 18], "100": [6, 9, 10, 16, 18], "1000": 18, "101": 6, "1024": [8, 12, 18], "104": 6, "106": 6, "108": 6, "1095": 16, "11": 18, "110": 10, "1107": 16, "114": 6, "115": 6, "1156": 16, "116": 6, "118": 6, "11800h": 18, "11th": 18, "12": 18, "120": 6, "123": 6, "126": 6, "1268": 16, "128": [8, 12, 17, 18], "13": 18, "130": 6, "13068": 16, "131": 6, "1337891": 16, "1357421875": 18, "1396484375": 18, "14": 18, "1420": 18, "14470v1": 6, "149": 16, "15": 18, "150": [10, 18], "1552": 18, "16": [8, 17, 18], "1630859375": 18, "1684": 18, "16x16": 8, "17": 18, "1778": 18, "1782": 18, "18": [8, 18], "185546875": 18, "1900": 18, "1910": 8, "19342": 16, "19370": 16, "195": 6, "19598": 16, "199": 18, "1999": 18, "2": [3, 4, 6, 7, 9, 15, 18], "20": 18, "200": 10, "2000": 16, "2003": [4, 6], "2012": 6, "2013": [4, 6], "2015": 6, "2019": 4, "207901": 16, "21": 18, "2103": 6, "2186": 16, "21888": 16, "22": 18, "224": [8, 9], "225": 9, "22672": 16, "229": [9, 16], "23": 18, "233": 16, "236": 6, "24": 18, "246": 16, "249": 16, "25": 18, "2504": 18, "255": [7, 8, 9, 10, 18], "256": 8, "257": 16, "26": 18, "26032": 16, "264": 12, "27": 18, "2700": 16, "2710": 18, "2749": 12, "28": 18, "287": 12, "29": 18, "296": 12, "299": 12, "2d": 18, "3": [3, 4, 7, 8, 9, 10, 17, 18], "30": 18, "300": 16, "3000": 16, "301": 12, "30595": 18, "30ghz": 18, "31": 8, "32": [6, 8, 9, 12, 16, 17, 18], "3232421875": 18, "33": [9, 18], "33402": 16, "33608": 16, "34": [8, 18], "340": 18, "3456": 18, "3515625": 18, "36": 18, "360": 16, "37": [6, 18], "38": 18, "39": 18, "4": [8, 9, 10, 18], "40": 18, "406": 9, "41": 18, "42": 18, "43": 18, "44": 18, "45": 18, "456": 9, "46": 18, "47": 18, "472": 16, "48": [6, 18], "485": 9, "49": 18, "49377": 16, "5": [6, 9, 10, 15, 18], "50": [8, 16, 18], "51": 18, "51171875": 18, "512": 8, "52": [6, 18], "529": 18, "53": 18, "54": 18, "540": 18, "5478515625": 18, "55": 18, "56": 18, "57": 18, "58": [6, 18], "580": 18, "5810546875": 18, "583": 18, "59": 18, "597": 18, "5k": [4, 6], "5m": 18, "6": [9, 18], "60": 9, "600": [8, 10, 18], "61": 18, "62": 18, "626": 16, "63": 18, "64": [8, 9, 18], "641": 18, "647": 16, "65": 18, "66": 18, "67": 18, "68": 18, "69": 18, "693": 12, "694": 12, "695": 12, "6m": 18, "7": 18, "70": [6, 10, 18], "707470": 16, "71": [6, 18], "7100000": 16, "7141797": 16, "7149": 16, "72": 18, "72dpi": 7, "73": 18, "73257": 16, "74": 18, "75": [9, 18], "7581382": 16, "76": 18, "77": 18, "772": 12, "772875": 16, "78": 18, "785": 12, "79": 18, "793533": 16, "796": 16, "798": 12, "7m": 18, "8": [8, 9, 18], "80": 18, "800": [8, 10, 16, 18], "81": 18, "82": 18, "83": 18, "84": 18, "849": 16, "85": 18, "8564453125": 18, "857": 18, "85875": 16, "86": 18, "8603515625": 18, "87": 18, "8707": 16, "88": 18, "89": 18, "9": [3, 9, 18], "90": 18, "90k": 6, "90kdict32px": 6, "91": 18, "914085328578949": 18, "92": 18, "93": 18, "94": [6, 18], "95": [10, 18], "9578408598899841": 18, "96": 18, "97": 18, "98": 18, "99": 18, "9949972033500671": 18, "A": [1, 2, 4, 6, 7, 8, 11, 17], "As": 2, "Be": 18, "Being": 1, "By": 13, "For": [1, 2, 3, 12, 18], "If": [2, 7, 8, 12, 18], "In": [2, 6, 16], "It": [9, 14, 15, 17], "Its": [4, 8], "No": [1, 18], "Of": 6, "Or": [15, 17], "The": [1, 2, 6, 7, 10, 13, 15, 16, 17, 18], "Then": 8, "To": [2, 3, 13, 14, 15, 17, 18], "_": [1, 6, 8], "__call__": 18, "_build": 2, "_i": 10, "ab": 6, "abc": 17, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "abdef": [6, 16], "abl": [16, 18], "about": [1, 16, 18], "abov": 18, "abstractdataset": 6, "abus": 1, "accept": 1, "access": [4, 7, 16, 18], "account": [1, 14], "accur": 18, "accuraci": 10, "achiev": 17, "act": 1, "action": 1, "activ": 4, "ad": [2, 8, 9], "adapt": 1, "add": [9, 10, 14, 18], "add_hook": 18, "add_label": 10, "addit": [2, 3, 7, 15, 18], "addition": [2, 18], "address": [1, 7], "adjust": 9, "advanc": 1, "advantag": 17, "advis": 2, "aesthet": [4, 6], "affect": 1, "after": [14, 18], "ag": 1, "again": 8, "aggreg": [10, 16], "aggress": 1, "align": [1, 7, 9], "all": [1, 2, 5, 6, 7, 9, 10, 15, 16, 18], "allow": [1, 17], "along": 18, "alreadi": [2, 17], "also": [1, 8, 14, 15, 16, 18], "alwai": 16, "an": [1, 2, 4, 6, 7, 8, 10, 15, 17, 18], "analysi": [7, 15], "ancient_greek": 6, "angl": [7, 9], "ani": [1, 6, 7, 8, 9, 10, 17, 18], "annot": 6, "anot": 16, "anoth": [8, 12, 16], "answer": 1, "anyascii": 10, "anyon": 4, "anyth": 15, "api": [2, 4], "apolog": 1, "apologi": 1, "app": 2, "appear": 1, "appli": [1, 6, 9], "applic": [4, 8], "appoint": 1, "appreci": 14, "appropri": [1, 2, 18], "ar": [1, 2, 3, 5, 6, 7, 9, 10, 11, 15, 16, 18], "arab": 6, "arabic_diacrit": 6, "arabic_lett": 6, "arabic_punctu": 6, "arbitrarili": [4, 8], "arch": [8, 14], "architectur": [4, 8, 14, 15], "area": 18, "argument": [6, 7, 8, 10, 12, 18], "around": 1, "arrai": [7, 9, 10], "art": [4, 15], "artefact": [10, 15, 18], "artefact_typ": 7, "artifici": [4, 6], "arxiv": [6, 8], "asarrai": 10, "ascii_lett": 6, "aspect": [4, 8, 9, 18], "assess": 10, "assign": 10, "associ": 7, "assum": 8, "assume_straight_pag": [8, 12, 18], "astyp": [8, 10, 18], "attack": 1, "attend": [4, 8], "attent": [1, 8], "autom": 4, "automat": 18, "autoregress": [4, 8], "avail": [1, 4, 5, 9], "averag": [9, 18], "avoid": [1, 3], "aw": [4, 18], "awar": 18, "azur": 18, "b": [8, 10, 18], "b_j": 10, "back": 2, "backbon": 8, "backend": 18, "background": 16, "bangla": 6, "bar": 15, "bar_cod": 16, "base": [4, 8, 15], "baselin": [4, 8, 18], "batch": [6, 8, 9, 15, 16, 18], "batch_siz": [6, 12, 15, 16, 17], "bblanchon": 3, "bbox": 18, "becaus": 13, "been": [2, 10, 16, 18], "befor": [6, 8, 9, 18], "begin": 10, "behavior": [1, 18], "being": [10, 18], "belong": 18, "benchmark": 18, "best": 1, "better": [11, 18], "between": [9, 10, 18], "bgr": 7, "bilinear": 9, "bin_thresh": 18, "binar": [4, 8, 18], "binari": [7, 17, 18], "bit": 17, "block": [10, 18], "block_1_1": 18, "blur": 9, "bmvc": 6, "bn": 14, "bodi": [1, 18], "bool": [6, 7, 8, 9, 10], "boolean": [8, 18], "both": [4, 6, 9, 16, 18], "bottom": [8, 18], "bound": [6, 7, 8, 9, 10, 15, 16, 18], "box": [6, 7, 8, 9, 10, 15, 16, 18], "box_thresh": 18, "bright": 9, "browser": [2, 4], "build": [2, 3, 17], "built": 2, "byte": [7, 18], "c": [3, 7, 10], "c_j": 10, "cach": [2, 6, 13], "cache_sampl": 6, "call": 17, "callabl": [6, 9], "can": [2, 3, 12, 13, 14, 15, 16, 18], "capabl": [2, 11, 18], "case": [6, 10], "cf": 18, "cfg": 18, "challeng": 6, "challenge2_test_task12_imag": 6, "challenge2_test_task1_gt": 6, "challenge2_training_task12_imag": 6, "challenge2_training_task1_gt": 6, "chang": [13, 18], "channel": [1, 2, 7, 9], "channel_prior": 3, "channelshuffl": 9, "charact": [4, 6, 7, 10, 16, 18], "charactergener": [6, 16], "characterist": 1, "charg": 18, "charset": 18, "chart": 7, "check": [2, 14, 18], "checkpoint": 8, "chip": 3, "ci": 2, "clarifi": 1, "clariti": 1, "class": [1, 6, 7, 9, 10, 18], "class_nam": 12, "classif": [16, 18], "classmethod": 7, "clear": 2, "clone": 3, "close": 2, "co": 14, "code": [4, 7, 15], "codecov": 2, "colab": 11, "collate_fn": 6, "collect": [7, 15], "color": 9, "colorinvers": 9, "column": 7, "com": [1, 3, 7, 8, 14], "combin": 18, "command": [2, 15], "comment": 1, "commit": 1, "common": [1, 9, 10, 17], "commun": 1, "compar": 4, "comparison": [10, 18], "competit": 6, "compil": [11, 18], "complaint": 1, "complementari": 10, "complet": 2, "compon": 18, "compos": [6, 18], "comprehens": 18, "comput": [6, 10, 17, 18], "conf_threshold": 15, "confid": [7, 18], "config": [3, 8], "configur": 8, "confus": 10, "consecut": [9, 18], "consequ": 1, "consid": [1, 2, 6, 7, 10, 18], "consist": 18, "consolid": [4, 6], "constant": 9, "construct": 1, "contact": 1, "contain": [5, 6, 11, 16, 18], "content": [6, 7, 18], "context": 8, "contib": 3, "continu": 1, "contrast": 9, "contrast_factor": 9, "contrib": [3, 15], "contribut": 1, "contributor": 2, "convers": 7, "convert": [7, 9], "convolut": 8, "coordin": [7, 18], "cord": [4, 6, 16, 18], "core": [10, 18], "corner": 18, "correct": 9, "correspond": [3, 7, 9, 18], "could": [1, 15], "counterpart": 10, "cover": 2, "coverag": 2, "cpu": [4, 12, 17], "creat": 14, "crnn": [4, 8, 14], "crnn_mobilenet_v3_larg": [8, 14, 18], "crnn_mobilenet_v3_smal": [8, 17, 18], "crnn_vgg16_bn": [8, 12, 14, 18], "crop": [7, 8, 9, 12, 16, 18], "crop_orient": [7, 18], "crop_orientation_predictor": [8, 12], "crop_param": 12, "cuda": 17, "currenc": 6, "current": [2, 12, 18], "custom": [14, 15, 17, 18], "custom_crop_orientation_model": 12, "custom_page_orientation_model": 12, "customhook": 18, "cvit": 4, "czczup": 8, "czech": 6, "d": [6, 16], "danish": 6, "data": [4, 6, 7, 9, 10, 12, 14], "dataload": 16, "dataset": [8, 12, 18], "dataset_info": 6, "date": [12, 18], "db": 14, "db_mobilenet_v3_larg": [8, 14, 18], "db_resnet34": 18, "db_resnet50": [8, 12, 14, 18], "dbnet": [4, 8], "deal": [11, 18], "decis": 1, "decod": 7, "decode_img_as_tensor": 7, "dedic": 17, "deem": 1, "deep": [8, 18], "def": 18, "default": [3, 7, 12, 13, 18], "defer": 16, "defin": [10, 17], "degre": [7, 9, 18], "degress": 7, "delet": 2, "delimit": 18, "delta": 9, "demo": [2, 4], "demonstr": 1, "depend": [2, 3, 4, 18], "deploi": 2, "deploy": 4, "derogatori": 1, "describ": 8, "descript": 11, "design": 9, "desir": 7, "det_arch": [8, 12, 14, 17], "det_b": 18, "det_model": [12, 14, 17], "det_param": 12, "det_predictor": [12, 18], "detail": [12, 18], "detect": [6, 7, 10, 11, 12, 15], "detect_languag": 8, "detect_orient": [8, 12, 18], "detection_predictor": [8, 18], "detection_task": [6, 16], "detectiondataset": [6, 16], "detectionmetr": 10, "detectionpredictor": [8, 12], "detector": [4, 8, 15], "deterior": 8, "determin": 1, "dev": [2, 13], "develop": 3, "deviat": 9, "devic": 17, "dict": [7, 10, 18], "dictionari": [7, 10], "differ": 1, "differenti": [4, 8], "digit": [4, 6, 16], "dimens": [7, 10, 18], "dimension": 9, "direct": 6, "directli": [14, 18], "directori": [2, 13], "disabl": [1, 13, 18], "disable_crop_orient": 18, "disable_page_orient": 18, "disclaim": 18, "discuss": 2, "disparag": 1, "displai": [7, 10], "display_artefact": 10, "distribut": 9, "div": 18, "divers": 1, "divid": 7, "do": [2, 3, 8], "doc": [2, 7, 15, 17, 18], "docartefact": [6, 16], "docstr": 2, "doctr": [3, 12, 13, 14, 15, 16, 17, 18], "doctr_cache_dir": 13, "doctr_multiprocessing_dis": 13, "document": [6, 8, 10, 11, 12, 15, 16, 17, 18], "documentbuild": 18, "documentfil": [7, 12, 14, 15, 17], "doesn": 17, "don": [12, 18], "done": 9, "download": [6, 16], "downsiz": 8, "draw": 9, "drop": 6, "drop_last": 6, "dtype": [7, 8, 9, 10, 17], "dual": [4, 6], "dummi": 14, "dummy_img": 18, "dummy_input": 17, "dure": 1, "dutch": 6, "dynam": [6, 15], "dynamic_seq_length": 6, "e": [1, 2, 3, 7, 8], "each": [4, 6, 7, 8, 9, 10, 16, 18], "eas": 2, "easi": [4, 10, 14, 17], "easili": [7, 10, 12, 14, 16, 18], "econom": 1, "edit": 1, "educ": 1, "effect": 18, "effici": [2, 4, 6, 8], "either": [10, 18], "element": [6, 7, 8, 18], "els": [2, 15], "email": 1, "empathi": 1, "en": 18, "enabl": [6, 7], "enclos": 7, "encod": [4, 6, 7, 8, 18], "encode_sequ": 6, "encount": 2, "encrypt": 7, "end": [4, 6, 8, 10], "english": [6, 16], "enough": [2, 18], "ensur": 2, "entri": 6, "environ": [1, 13], "eo": 6, "equiv": 18, "estim": 8, "etc": [7, 15], "ethnic": 1, "evalu": [16, 18], "event": 1, "everyon": 1, "everyth": [2, 18], "exact": [10, 18], "exampl": [1, 2, 4, 6, 8, 14, 18], "exchang": 17, "execut": 18, "exist": 14, "expand": 9, "expect": [7, 9, 10], "experi": 1, "explan": [1, 18], "explicit": 1, "exploit": [4, 8], "export": [7, 8, 10, 11, 15, 18], "export_as_straight_box": [8, 18], "export_as_xml": 18, "export_model_to_onnx": 17, "express": [1, 9], "extens": 7, "extern": [1, 16], "extract": [4, 6], "extractor": 8, "f_": 10, "f_a": 10, "factor": 9, "fair": 1, "fairli": 1, "fals": [6, 7, 8, 9, 10, 12, 18], "faq": 1, "fascan": 14, "fast": [4, 6, 8], "fast_bas": [8, 18], "fast_smal": [8, 18], "fast_tini": [8, 18], "faster": [4, 8, 17], "fasterrcnn_mobilenet_v3_large_fpn": 8, "favorit": 18, "featur": [3, 8, 10, 11, 12, 15], "feedback": 1, "feel": [2, 14], "felix92": 14, "few": [17, 18], "figsiz": 10, "figur": [10, 15], "file": [2, 6], "final": 8, "find": [2, 16], "finnish": 6, "first": [2, 6], "firsthand": 6, "fit": [8, 18], "flag": 18, "flip": 9, "float": [7, 9, 10, 17], "float32": [7, 8, 9, 17], "fn": 9, "focu": 14, "focus": [1, 6], "folder": 6, "follow": [1, 2, 3, 6, 9, 10, 12, 13, 14, 15, 18], "font": 6, "font_famili": 6, "foral": 10, "forc": 2, "forg": 3, "form": [4, 6, 18], "format": [7, 10, 12, 16, 17, 18], "forpost": [4, 6], "forum": 2, "fp16": 17, "frac": 10, "framework": [3, 14, 16, 18], "free": [1, 2, 14], "french": [6, 12, 14, 18], "friendli": 4, "from": [1, 4, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18], "from_hub": [8, 14], "from_imag": [7, 14, 15, 17], "from_pdf": 7, "from_url": 7, "full": [6, 10, 18], "function": [6, 9, 10, 15], "funsd": [4, 6, 16, 18], "further": 16, "futur": 6, "g": [7, 8], "g_": 10, "g_x": 10, "gamma": 9, "gaussian": 9, "gaussianblur": 9, "gaussiannois": 9, "gen": 18, "gender": 1, "gener": [2, 4, 7, 8], "generic_cyrillic_lett": 6, "geometri": [4, 7, 18], "geq": 10, "german": [6, 12, 14], "get": [17, 18], "git": 14, "github": [2, 3, 8, 14], "give": [1, 15], "given": [6, 7, 9, 10, 18], "global": 8, "go": 18, "good": 17, "googl": 2, "googlevis": 4, "gpu": [4, 15, 17], "gracefulli": 1, "graph": [4, 6, 7], "grayscal": 9, "ground": 10, "groung": 10, "group": [4, 18], "gt": 10, "gt_box": 10, "gt_label": 10, "guid": 2, "guidanc": 16, "gvision": 18, "h": [7, 8, 9], "h_": 10, "ha": [2, 6, 10, 16], "handl": [11, 16, 18], "handwrit": 6, "handwritten": 16, "harass": 1, "hardwar": 18, "harm": 1, "hat": 10, "have": [1, 2, 10, 12, 14, 16, 17, 18], "head": [8, 18], "healthi": 1, "hebrew": 6, "height": [7, 9], "hello": [10, 18], "help": 17, "here": [5, 9, 11, 15, 16, 18], "hf": 8, "hf_hub_download": 8, "high": 7, "higher": [3, 6, 18], "hindi": 6, "hindi_digit": 6, "hocr": 18, "hook": 18, "horizont": [7, 9, 18], "hous": 6, "how": [2, 11, 12, 14, 16], "howev": 16, "hsv": 9, "html": [1, 2, 3, 7, 18], "http": [1, 3, 6, 7, 8, 14, 18], "hub": 8, "hue": 9, "huggingfac": 8, "hw": 6, "i": [1, 2, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17], "i7": 18, "ic03": [4, 6, 16], "ic13": [4, 6, 16], "icdar": [4, 6], "icdar2019": 6, "id": 18, "ident": 1, "identifi": 4, "iiit": [4, 6], "iiit5k": [6, 16], "iiithw": [4, 6, 16], "imag": [4, 6, 7, 8, 9, 10, 14, 15, 16, 18], "imagenet": 8, "imageri": 1, "images_90k_norm": 6, "img": [6, 9, 16, 17], "img_cont": 7, "img_fold": [6, 16], "img_path": 7, "img_transform": 6, "imgur5k": [4, 6, 16], "imgur5k_annot": 6, "imlist": 6, "impact": 1, "implement": [6, 7, 8, 9, 10, 18], "import": [6, 7, 8, 9, 10, 12, 14, 15, 16, 17, 18], "improv": 8, "inappropri": 1, "incid": 1, "includ": [1, 6, 16, 17], "inclus": 1, "increas": 9, "independ": 9, "index": [2, 7], "indic": 10, "individu": 1, "infer": [4, 8, 9, 15, 18], "inform": [1, 2, 4, 6, 16], "input": [2, 7, 8, 9, 17, 18], "input_crop": 8, "input_pag": [8, 10, 18], "input_shap": 17, "input_tensor": 8, "inspir": [1, 9], "instal": [14, 15, 17], "instanc": [1, 18], "instanti": [8, 18], "instead": [6, 7, 8], "insult": 1, "int": [6, 7, 9], "int64": 10, "integ": 10, "integr": [4, 14, 16], "intel": 18, "interact": [1, 7, 10], "interfac": [14, 17], "interoper": 17, "interpol": 9, "interpret": [6, 7], "intersect": 10, "invert": 9, "investig": 1, "invis": 1, "involv": [1, 18], "io": [12, 14, 15, 17], "iou": 10, "iou_thresh": 10, "iou_threshold": 15, "irregular": [4, 8, 16], "isn": 6, "issu": [1, 2, 14], "italian": 6, "iter": [6, 9, 16, 18], "its": [7, 8, 9, 10, 16, 18], "itself": [8, 14], "j": 10, "job": 2, "join": 2, "jpeg": 9, "jpegqual": 9, "jpg": [6, 7, 14, 17], "json": [6, 16, 18], "json_output": 18, "jump": 2, "just": 1, "kei": [4, 6], "kera": [8, 17], "kernel": [4, 8, 9], "kernel_shap": 9, "keywoard": 8, "keyword": [6, 7, 8, 10], "kie": [8, 12], "kie_predictor": [8, 12], "kiepredictor": 8, "kind": 1, "know": [2, 17], "kwarg": [6, 7, 8, 10], "l": 10, "l_j": 10, "label": [6, 10, 15, 16], "label_fil": [6, 16], "label_fold": 6, "label_path": [6, 16], "labels_path": [6, 16], "ladder": 1, "lambda": 9, "lambdatransform": 9, "lang": 18, "languag": [1, 4, 6, 7, 8, 14, 18], "larg": [8, 14], "largest": 10, "last": [3, 6], "latenc": 8, "later": 2, "latest": 18, "latin": 6, "layer": 17, "layout": 18, "lead": 1, "leader": 1, "learn": [1, 4, 8, 17, 18], "least": 3, "left": [10, 18], "legacy_french": 6, "length": [6, 18], "less": [17, 18], "level": [1, 6, 10, 18], "leverag": 11, "lf": 14, "librari": [2, 3, 11, 12], "light": 4, "lightweight": 17, "like": 1, "limits_": 10, "line": [4, 8, 10, 18], "line_1_1": 18, "link": 12, "linknet": [4, 8], "linknet_resnet18": [8, 12, 17, 18], "linknet_resnet34": [8, 17, 18], "linknet_resnet50": [8, 18], "list": [6, 7, 9, 10, 14], "ll": 10, "load": [4, 6, 8, 15, 17], "load_state_dict": 12, "load_weight": 12, "loc_pr": 18, "local": [2, 4, 6, 8, 10, 16, 18], "localis": 6, "localizationconfus": 10, "locat": [2, 7, 18], "login": 8, "login_to_hub": [8, 14], "logo": [7, 15, 16], "love": 14, "lower": [9, 10, 18], "m": [2, 10, 18], "m1": 3, "macbook": 3, "machin": 17, "made": 4, "magc_resnet31": 8, "mai": [1, 2], "mail": 1, "main": 11, "maintain": 4, "mainten": 2, "make": [1, 2, 10, 12, 13, 14, 17, 18], "mani": [16, 18], "manipul": 18, "map": [6, 8], "map_loc": 12, "master": [4, 8, 18], "match": [10, 18], "mathcal": 10, "matplotlib": [7, 10], "max": [6, 9, 10], "max_angl": 9, "max_area": 9, "max_char": [6, 16], "max_delta": 9, "max_gain": 9, "max_gamma": 9, "max_qual": 9, "max_ratio": 9, "maximum": [6, 9], "maxval": [8, 9], "mbox": 10, "mean": [9, 10, 12], "meaniou": 10, "meant": [7, 17], "measur": 18, "media": 1, "median": 8, "meet": 12, "member": 1, "memori": [13, 17], "mention": 18, "merg": 6, "messag": 2, "meta": 18, "metadata": 17, "metal": 3, "method": [7, 9, 18], "metric": [10, 18], "middl": 18, "might": [17, 18], "min": 9, "min_area": 9, "min_char": [6, 16], "min_gain": 9, "min_gamma": 9, "min_qual": 9, "min_ratio": 9, "min_val": 9, "minde": [1, 3, 4, 8], "minim": [2, 4], "minimalist": [4, 8], "minimum": [3, 6, 9, 10, 18], "minval": 9, "miss": 3, "mistak": 1, "mixed_float16": 17, "mixed_precis": 17, "mjsynth": [4, 6, 16], "mnt": 6, "mobilenet": [8, 14], "mobilenet_v3_larg": 8, "mobilenet_v3_large_r": 8, "mobilenet_v3_smal": [8, 12], "mobilenet_v3_small_crop_orient": [8, 12], "mobilenet_v3_small_page_orient": [8, 12], "mobilenet_v3_small_r": 8, "mobilenetv3": 8, "modal": [4, 6], "mode": 3, "model": [6, 10, 13, 15, 16], "model_nam": [8, 14, 17], "model_path": [15, 17], "moder": 1, "modif": 2, "modifi": [8, 13, 18], "modul": [3, 7, 8, 9, 10, 18], "more": [2, 16, 18], "most": 18, "mozilla": 1, "multi": [4, 8], "multilingu": [6, 14], "multipl": [6, 7, 9, 18], "multipli": 9, "multiprocess": 13, "my": 8, "my_awesome_model": 14, "my_hook": 18, "n": [6, 10], "name": [6, 8, 17, 18], "nation": 1, "natur": [1, 4, 6], "ndarrai": [6, 7, 9, 10], "necessari": [3, 12, 13], "need": [2, 3, 6, 10, 12, 13, 14, 15, 18], "neg": 9, "nest": 18, "network": [4, 6, 8, 17], "neural": [4, 6, 8, 17], "new": [2, 10], "next": [6, 16], "nois": 9, "noisi": [4, 6], "non": [4, 6, 7, 8, 9, 10], "none": [6, 7, 8, 9, 10, 18], "normal": [8, 9], "norwegian": 6, "note": [0, 2, 6, 8, 12, 14, 15, 17], "now": 2, "np": [8, 9, 10, 18], "num_output_channel": 9, "num_sampl": [6, 16], "number": [6, 9, 10, 18], "numpi": [7, 8, 10, 18], "o": 3, "obb": 15, "obj_detect": 14, "object": [6, 7, 10, 15, 18], "objectness_scor": [7, 18], "oblig": 1, "obtain": 18, "occupi": 17, "ocr": [4, 6, 8, 10, 14], "ocr_carea": 18, "ocr_db_crnn": 10, "ocr_lin": 18, "ocr_pag": 18, "ocr_par": 18, "ocr_predictor": [8, 12, 14, 17, 18], "ocrdataset": [6, 16], "ocrmetr": 10, "ocrpredictor": [8, 12], "ocrx_word": 18, "offens": 1, "offici": [1, 8], "offlin": 1, "offset": 9, "onc": 18, "one": [2, 6, 8, 9, 12, 14, 18], "oneof": 9, "ones": [6, 10], "onli": [2, 8, 9, 10, 12, 14, 16, 17, 18], "onlin": 1, "onnx": 15, "onnxruntim": [15, 17], "onnxtr": 17, "opac": 9, "opacity_rang": 9, "open": [1, 2, 14, 17], "opinion": 1, "optic": [4, 18], "optim": [4, 18], "option": [6, 8, 12], "order": [2, 6, 7, 9], "org": [1, 6, 8, 18], "organ": 7, "orient": [1, 7, 8, 11, 15, 18], "orientationpredictor": 8, "other": [1, 2], "otherwis": [1, 7, 10], "our": [2, 8, 18], "out": [2, 8, 9, 10, 18], "outpout": 18, "output": [7, 9, 17], "output_s": [7, 9], "outsid": 13, "over": [6, 10, 18], "overal": [1, 8], "overlai": 7, "overview": 15, "overwrit": 12, "overwritten": 14, "own": 4, "p": [9, 18], "packag": [2, 4, 10, 13, 15, 16, 17], "pad": [6, 8, 9, 18], "page": [3, 6, 8, 10, 12, 18], "page1": 7, "page2": 7, "page_1": 18, "page_idx": [7, 18], "page_orientation_predictor": [8, 12], "page_param": 12, "pair": 10, "paper": 8, "par_1_1": 18, "paragraph": 18, "paragraph_break": 18, "param": [9, 18], "paramet": [4, 7, 8, 17], "pars": [4, 6], "parseq": [4, 8, 14, 17, 18], "part": [6, 9, 18], "parti": 3, "partial": 18, "particip": 1, "pass": [6, 7, 8, 12, 18], "password": 7, "patch": [8, 10], "path": [6, 7, 15, 16, 17], "path_to_checkpoint": 12, "path_to_custom_model": 17, "path_to_pt": 12, "pattern": 1, "pdf": [7, 8, 11], "pdfpage": 7, "peopl": 1, "per": [9, 18], "perform": [4, 7, 8, 9, 10, 13, 17, 18], "period": 1, "permiss": 1, "permut": [4, 8], "persian_lett": 6, "person": [1, 16], "phase": 18, "photo": 16, "physic": [1, 7], "pick": 9, "pictur": 7, "pip": [2, 3, 15, 17], "pipelin": 18, "pixel": [7, 9, 18], "pleas": 2, "plot": 10, "plt": 10, "plug": 14, "plugin": 3, "png": 7, "point": 17, "polici": 13, "polish": 6, "polit": 1, "polygon": [6, 10, 18], "pool": 8, "portugues": 6, "posit": [1, 10], "possibl": [2, 10, 14, 18], "post": [1, 18], "postprocessor": 18, "potenti": 8, "power": 4, "ppageno": 18, "pre": [2, 8, 17], "precis": [10, 18], "pred": 10, "pred_box": 10, "pred_label": 10, "predefin": 16, "predict": [7, 8, 10, 18], "predictor": [4, 7, 8, 11, 12, 14, 17], "prefer": 16, "preinstal": 3, "preprocessor": [12, 18], "prerequisit": 14, "present": 11, "preserv": [8, 9, 18], "preserve_aspect_ratio": [7, 8, 9, 12, 18], "pretrain": [4, 8, 10, 12, 17, 18], "pretrained_backbon": [8, 12], "print": 18, "prior": 6, "privaci": 1, "privat": 1, "probabl": 9, "problem": 2, "procedur": 9, "process": [2, 4, 7, 12, 18], "processor": 18, "produc": [11, 18], "product": 17, "profession": 1, "project": [2, 16], "promptli": 1, "proper": 2, "properli": 6, "provid": [1, 2, 4, 14, 15, 16, 18], "public": [1, 4], "publicli": 18, "publish": 1, "pull": 14, "punctuat": 6, "pure": 6, "purpos": 2, "push_to_hf_hub": [8, 14], "py": 14, "pypdfium2": [3, 7], "pyplot": [7, 10], "python": [2, 15], "python3": 14, "pytorch": [3, 4, 8, 9, 12, 14, 17, 18], "q": 2, "qr": [7, 15], "qr_code": 16, "qualiti": 9, "question": 1, "quickli": 4, "quicktour": 11, "r": 18, "race": 1, "ramdisk": 6, "rand": [8, 9, 10, 17, 18], "random": [8, 9, 10, 18], "randomappli": 9, "randombright": 9, "randomcontrast": 9, "randomcrop": 9, "randomgamma": 9, "randomhorizontalflip": 9, "randomhu": 9, "randomjpegqu": 9, "randomli": 9, "randomres": 9, "randomrot": 9, "randomsatur": 9, "randomshadow": 9, "rang": 9, "rassi": 14, "ratio": [8, 9, 18], "raw": [7, 10], "re": 17, "read": [4, 6, 8], "read_html": 7, "read_img_as_numpi": 7, "read_img_as_tensor": 7, "read_pdf": 7, "readi": 17, "real": [4, 8, 9], "reason": [1, 4, 6], "rebuild": 2, "rebuilt": 2, "recal": [10, 18], "receipt": [4, 6, 18], "reco_arch": [8, 12, 14, 17], "reco_b": 18, "reco_model": [12, 14, 17], "reco_param": 12, "reco_predictor": 12, "recogn": 18, "recognit": [6, 10, 11, 12], "recognition_predictor": [8, 18], "recognition_task": [6, 16], "recognitiondataset": [6, 16], "recognitionpredictor": [8, 12], "rectangular": 8, "reduc": [3, 9], "refer": [2, 3, 12, 14, 15, 16, 18], "regardless": 1, "region": 18, "regroup": 10, "regular": 16, "reject": 1, "rel": [7, 9, 10, 18], "relat": 7, "releas": [0, 3], "relev": 15, "religion": 1, "remov": 1, "render": [7, 18], "repo": 8, "repo_id": [8, 14], "report": 1, "repositori": [6, 8, 14], "repres": [1, 17, 18], "represent": [4, 8], "request": [1, 14], "requir": [3, 9, 17], "research": 4, "residu": 8, "resiz": [9, 18], "resnet": 8, "resnet18": [8, 14], "resnet31": 8, "resnet34": 8, "resnet50": [8, 14], "resolv": 7, "resolve_block": 18, "resolve_lin": 18, "resourc": 16, "respect": 1, "rest": [2, 9, 10], "restrict": 13, "result": [2, 6, 7, 11, 14, 17, 18], "return": 18, "reusabl": 18, "review": 1, "rgb": [7, 9], "rgb_mode": 7, "rgb_output": 7, "right": [1, 8, 10], "robust": [4, 6], "root": 6, "rotat": [6, 7, 8, 9, 10, 11, 12, 16, 18], "run": [2, 3, 8], "same": [2, 7, 10, 16, 17, 18], "sampl": [6, 16, 18], "sample_transform": 6, "sar": [4, 8], "sar_resnet31": [8, 18], "satur": 9, "save": [8, 16], "scale": [7, 8, 9, 10], "scale_rang": 9, "scan": [4, 6], "scene": [4, 6, 8], "score": [7, 10], "script": [2, 16], "seamless": 4, "seamlessli": [4, 18], "search": 8, "searchabl": 11, "sec": 18, "second": 18, "section": [12, 14, 15, 17, 18], "secur": [1, 13], "see": [1, 2], "seen": 18, "segment": [4, 8, 18], "self": 18, "semant": [4, 8], "send": 18, "sens": 10, "sensit": 16, "separ": 18, "sequenc": [4, 6, 7, 8, 10, 18], "sequenti": [9, 18], "seri": 1, "seriou": 1, "set": [1, 3, 6, 8, 10, 13, 15, 18], "set_global_polici": 17, "sever": [7, 9, 18], "sex": 1, "sexual": 1, "shade": 9, "shape": [4, 7, 8, 9, 10, 18], "share": [13, 16], "shift": 9, "shm": 13, "should": [2, 6, 7, 9, 10], "show": [4, 7, 8, 10, 12, 14, 15], "showcas": [2, 11], "shuffl": [6, 9], "side": 10, "signatur": 7, "signific": 16, "simpl": [4, 8, 17], "simpler": 8, "sinc": [6, 16], "singl": [1, 2, 4, 6], "single_img_doc": 17, "size": [1, 6, 7, 9, 15, 18], "skew": 18, "slack": 2, "slightli": 8, "small": [2, 8, 18], "smallest": 7, "snapshot_download": 8, "snippet": 18, "so": [2, 3, 6, 8, 14, 16], "social": 1, "socio": 1, "some": [3, 11, 14, 16], "someth": 2, "somewher": 2, "sort": 1, "sourc": [6, 7, 8, 9, 10, 14], "space": [1, 18], "span": 18, "spanish": 6, "spatial": [4, 6, 7], "specif": [2, 3, 10, 12, 16, 18], "specifi": [1, 6, 7], "speed": [4, 8, 18], "sphinx": 2, "sroie": [4, 6, 16], "stabl": 3, "stackoverflow": 2, "stage": 4, "standalon": 11, "standard": 9, "start": 6, "state": [4, 10, 15], "static": 10, "statu": 1, "std": [9, 12], "step": 13, "still": 18, "str": [6, 7, 8, 9, 10], "straight": [6, 8, 16, 18], "straighten": 18, "straighten_pag": [8, 12, 18], "straigten_pag": 12, "stream": 7, "street": [4, 6], "strict": 3, "strictli": 10, "string": [6, 7, 10, 18], "strive": 3, "strong": [4, 8], "structur": [17, 18], "subset": [6, 18], "suggest": [2, 14], "sum": 10, "summari": 10, "support": [3, 12, 15, 17, 18], "sustain": 1, "svhn": [4, 6, 16], "svt": [6, 16], "swedish": 6, "symmetr": [8, 9, 18], "symmetric_pad": [8, 9, 18], "synthet": 4, "synthtext": [4, 6, 16], "system": 18, "t": [2, 6, 12, 17, 18], "tabl": [14, 15, 16], "take": [1, 6, 18], "target": [6, 7, 9, 10, 16], "target_s": 6, "task": [4, 6, 8, 14, 16, 18], "task2": 6, "team": 3, "techminde": 3, "templat": [2, 4], "tensor": [6, 7, 9, 18], "tensorflow": [3, 4, 7, 8, 9, 12, 14, 17, 18], "tensorspec": 17, "term": 1, "test": [6, 16], "test_set": 6, "text": [6, 7, 8, 10, 16], "text_output": 18, "textmatch": 10, "textnet": 8, "textnet_bas": 8, "textnet_smal": 8, "textnet_tini": 8, "textract": [4, 18], "textstylebrush": [4, 6], "textual": [4, 6, 7, 8, 18], "tf": [3, 7, 8, 9, 14, 17], "than": [2, 10, 14], "thank": 2, "thei": [1, 10], "them": [6, 18], "thi": [1, 2, 3, 5, 6, 9, 10, 12, 13, 14, 16, 17, 18], "thing": [17, 18], "third": 3, "those": [1, 7, 18], "threaten": 1, "threshold": 18, "through": [1, 9, 15, 16], "tilman": 14, "time": [1, 4, 8, 10, 16], "tini": 8, "titl": [7, 18], "tm": 18, "tmp": 13, "togeth": [2, 7], "tograi": 9, "tool": 16, "top": [10, 17, 18], "topic": 2, "torch": [3, 9, 12, 14, 17], "torchvis": 9, "total": 12, "toward": [1, 3], "train": [2, 6, 8, 9, 14, 15, 16, 17, 18], "train_it": [6, 16], "train_load": [6, 16], "train_pytorch": 14, "train_set": [6, 16], "train_tensorflow": 14, "trainabl": [4, 8], "tranform": 9, "transcrib": 18, "transfer": [4, 6], "transfo": 9, "transform": [4, 6, 8], "translat": 1, "troll": 1, "true": [6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18], "truth": 10, "tune": 17, "tupl": [6, 7, 9, 10], "two": [7, 13], "txt": 6, "type": [7, 10, 14, 17, 18], "typic": 18, "u": [1, 2], "ucsd": 6, "udac": 2, "uint8": [7, 8, 10, 18], "ukrainian": 6, "unaccept": 1, "underli": [16, 18], "underneath": 7, "understand": [4, 6, 18], "uniform": [8, 9], "uniformli": 9, "uninterrupt": [7, 18], "union": 10, "unittest": 2, "unlock": 7, "unoffici": 8, "unprofession": 1, "unsolicit": 1, "unsupervis": 4, "unwelcom": 1, "up": [8, 18], "updat": 10, "upgrad": 2, "upper": [6, 9], "uppercas": 16, "url": 7, "us": [1, 2, 3, 6, 8, 10, 11, 12, 13, 14, 15, 18], "usabl": 18, "usag": [13, 17], "use_polygon": [6, 10, 16], "useabl": 18, "user": [4, 7, 11], "utf": 18, "util": 17, "v1": 14, "v3": [8, 14, 18], "valid": 16, "valu": [2, 7, 9, 18], "valuabl": 4, "variabl": 13, "varieti": 6, "veri": 8, "version": [1, 2, 3, 17, 18], "vgg": 8, "vgg16": 14, "vgg16_bn_r": 8, "via": 1, "vietnames": 6, "view": [4, 6], "viewpoint": 1, "violat": 1, "visibl": 1, "vision": [4, 6, 8], "visiondataset": 6, "visiontransform": 8, "visual": [3, 4, 15], "visualize_pag": 10, "vit_": 8, "vit_b": 8, "vitstr": [4, 8, 17], "vitstr_bas": [8, 18], "vitstr_smal": [8, 12, 17, 18], "viz": 3, "vocab": [12, 14, 16, 17, 18], "vocabulari": [6, 12, 14], "w": [7, 8, 9, 10], "w3": 18, "wa": 1, "wai": [1, 4, 16], "want": [2, 17, 18], "warmup": 18, "wasn": 2, "we": [1, 2, 3, 4, 7, 9, 12, 14, 16, 17, 18], "weasyprint": 7, "web": [2, 7], "websit": 6, "welcom": 1, "well": [1, 17], "were": [1, 7, 18], "what": 1, "when": [1, 2, 8], "whenev": 2, "where": [2, 7, 9, 10], "whether": [2, 6, 7, 9, 10, 16, 18], "which": [1, 8, 13, 15, 16, 18], "whichev": 3, "while": [9, 18], "why": 1, "width": [7, 9], "wiki": 1, "wildreceipt": [4, 6, 16], "window": [8, 10], "wish": 2, "within": 1, "without": [1, 6, 8], "wonder": 2, "word": [4, 6, 8, 10, 18], "word_1_1": 18, "word_1_2": 18, "word_1_3": 18, "wordgener": [6, 16], "words_onli": 10, "work": [12, 13, 18], "workflow": 2, "worklow": 2, "world": [10, 18], "worth": 8, "wrap": 18, "wrapper": [6, 9], "write": 13, "written": [1, 7], "www": [1, 7, 18], "x": [7, 9, 10], "x_ascend": 18, "x_descend": 18, "x_i": 10, "x_size": 18, "x_wconf": 18, "xhtml": 18, "xmax": 7, "xmin": 7, "xml": 18, "xml_bytes_str": 18, "xml_element": 18, "xml_output": 18, "xmln": 18, "y": 10, "y_i": 10, "y_j": 10, "yet": 15, "ymax": 7, "ymin": 7, "yolov8": 15, "you": [2, 3, 6, 7, 8, 12, 13, 14, 15, 16, 17, 18], "your": [2, 4, 7, 10, 18], "yoursit": 7, "zero": [9, 10], "zoo": 12, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 6, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 6, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 6, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 6, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 6, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 6, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 6, "\u00e4\u00f6\u00e4\u00f6": 6, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 6, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 6, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 6, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 6, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 6, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 6, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0905": 6, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 6, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 6, "\u0950": 6, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 6, "\u09bd": 6, "\u09ce": 6, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 6}, "titles": ["Changelog", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 2, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 1], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 1], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 1], "31": 0, "4": [0, 1], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 18, "approach": 18, "architectur": 18, "arg": [6, 7, 8, 9, 10], "artefact": 7, "artefactdetect": 15, "attribut": 1, "avail": [15, 16, 18], "aw": 13, "ban": 1, "block": 7, "bug": 2, "changelog": 0, "choos": [16, 18], "classif": [8, 12, 14], "code": [1, 2], "codebas": 2, "commit": 2, "commun": 14, "compos": 9, "conda": 3, "conduct": 1, "connect": 2, "continu": 2, "contrib": 5, "contribut": [2, 5, 15], "contributor": 1, "convent": 14, "correct": 1, "coven": 1, "custom": [6, 12], "data": 16, "dataload": 6, "dataset": [4, 6, 16], "detect": [4, 8, 14, 16, 18], "develop": 2, "do": 18, "doctr": [2, 4, 5, 6, 7, 8, 9, 10, 11], "document": [2, 4, 7], "end": 18, "enforc": 1, "evalu": 10, "export": 17, "factori": 8, "featur": [2, 4], "feedback": 2, "file": 7, "from": 14, "gener": [6, 16], "git": 3, "guidelin": 1, "half": 17, "hub": 14, "huggingfac": 14, "i": 18, "infer": 17, "instal": [2, 3], "integr": [2, 15], "io": 7, "lambda": 13, "let": 2, "line": 7, "linux": 3, "load": [12, 14, 16], "loader": 6, "main": 4, "mode": 2, "model": [4, 8, 12, 14, 17, 18], "modifi": 2, "modul": [5, 15], "name": 14, "notebook": 11, "object": 16, "ocr": [16, 18], "onli": 3, "onnx": 17, "optim": 17, "option": 18, "orient": 12, "our": 1, "output": 18, "own": [12, 16], "packag": 3, "page": 7, "perman": 1, "pipelin": 15, "pledg": 1, "precis": 17, "predictor": 18, "prepar": 17, "prerequisit": 3, "pretrain": 14, "push": 14, "python": 3, "qualiti": 2, "question": 2, "read": 7, "readi": 16, "recognit": [4, 8, 14, 16, 18], "report": 2, "request": 2, "respons": 1, "return": [6, 7, 8, 10], "right": 18, "scope": 1, "share": 14, "should": 18, "stage": 18, "standard": 1, "structur": [2, 7], "style": 2, "support": [4, 5, 6, 9], "synthet": [6, 16], "task": 10, "temporari": 1, "test": 2, "text": [4, 18], "train": 12, "transform": 9, "two": 18, "unit": 2, "us": [16, 17], "util": 10, "v0": 0, "verif": 2, "via": 3, "visual": 10, "vocab": 6, "warn": 1, "what": 18, "word": 7, "your": [12, 14, 15, 16, 17], "zoo": [4, 8]}})
\ No newline at end of file
diff --git a/using_doctr/custom_models_training.html b/using_doctr/custom_models_training.html
index 580b4368b7..e664c6a950 100644
--- a/using_doctr/custom_models_training.html
+++ b/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -615,7 +615,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/using_doctr/running_on_aws.html b/using_doctr/running_on_aws.html
index ddb0c3c80f..81c38b49f5 100644
--- a/using_doctr/running_on_aws.html
+++ b/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -358,7 +358,7 @@ AWS Lambda
-
+
diff --git a/using_doctr/sharing_models.html b/using_doctr/sharing_models.html
index 07a3b2f2a3..4f5d1d68a5 100644
--- a/using_doctr/sharing_models.html
+++ b/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -540,7 +540,7 @@ Recognition
-
+
diff --git a/using_doctr/using_contrib_modules.html b/using_doctr/using_contrib_modules.html
index b4a10925e6..cf282ff3a4 100644
--- a/using_doctr/using_contrib_modules.html
+++ b/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -411,7 +411,7 @@ ArtefactDetection
-
+
diff --git a/using_doctr/using_datasets.html b/using_doctr/using_datasets.html
index 4a52df36ba..e30b6d6459 100644
--- a/using_doctr/using_datasets.html
+++ b/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -638,7 +638,7 @@ Data Loading
-
+
diff --git a/using_doctr/using_model_export.html b/using_doctr/using_model_export.html
index 2b30ee63a1..ad9d09ed4c 100644
--- a/using_doctr/using_model_export.html
+++ b/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -463,7 +463,7 @@ Using your ONNX exported model
-
+
diff --git a/using_doctr/using_models.html b/using_doctr/using_models.html
index 13cb06116b..5c80dbf62d 100644
--- a/using_doctr/using_models.html
+++ b/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1249,7 +1249,7 @@ Advanced options
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/cord.html b/v0.1.0/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.0/_modules/doctr/datasets/cord.html
+++ b/v0.1.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/detection.html b/v0.1.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.0/_modules/doctr/datasets/detection.html
+++ b/v0.1.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/doc_artefacts.html b/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/funsd.html b/v0.1.0/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.0/_modules/doctr/datasets/funsd.html
+++ b/v0.1.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ic03.html b/v0.1.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.0/_modules/doctr/datasets/ic03.html
+++ b/v0.1.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ic13.html b/v0.1.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.0/_modules/doctr/datasets/ic13.html
+++ b/v0.1.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/iiit5k.html b/v0.1.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/iiithws.html b/v0.1.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/imgur5k.html b/v0.1.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/loader.html b/v0.1.0/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.0/_modules/doctr/datasets/loader.html
+++ b/v0.1.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/mjsynth.html b/v0.1.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ocr.html b/v0.1.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.0/_modules/doctr/datasets/ocr.html
+++ b/v0.1.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/recognition.html b/v0.1.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.0/_modules/doctr/datasets/recognition.html
+++ b/v0.1.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/sroie.html b/v0.1.0/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.0/_modules/doctr/datasets/sroie.html
+++ b/v0.1.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/svhn.html b/v0.1.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.0/_modules/doctr/datasets/svhn.html
+++ b/v0.1.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/svt.html b/v0.1.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.0/_modules/doctr/datasets/svt.html
+++ b/v0.1.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/synthtext.html b/v0.1.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/utils.html b/v0.1.0/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.0/_modules/doctr/datasets/utils.html
+++ b/v0.1.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/wildreceipt.html b/v0.1.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.0/_modules/doctr/io/elements.html b/v0.1.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.0/_modules/doctr/io/elements.html
+++ b/v0.1.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.0/_modules/doctr/io/html.html b/v0.1.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.0/_modules/doctr/io/html.html
+++ b/v0.1.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.0/_modules/doctr/io/image/base.html b/v0.1.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.0/_modules/doctr/io/image/base.html
+++ b/v0.1.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.0/_modules/doctr/io/image/tensorflow.html b/v0.1.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/io/pdf.html b/v0.1.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.0/_modules/doctr/io/pdf.html
+++ b/v0.1.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.0/_modules/doctr/io/reader.html b/v0.1.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.0/_modules/doctr/io/reader.html
+++ b/v0.1.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/zoo.html b/v0.1.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/zoo.html b/v0.1.0/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/models/factory/hub.html b/v0.1.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.0/_modules/doctr/models/factory/hub.html
+++ b/v0.1.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/zoo.html b/v0.1.0/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/models/zoo.html b/v0.1.0/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.0/_modules/doctr/models/zoo.html
+++ b/v0.1.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/transforms/modules/base.html b/v0.1.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/utils/metrics.html b/v0.1.0/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.0/_modules/doctr/utils/metrics.html
+++ b/v0.1.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.0/_modules/doctr/utils/visualization.html b/v0.1.0/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.0/_modules/doctr/utils/visualization.html
+++ b/v0.1.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.0/_modules/index.html b/v0.1.0/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.0/_modules/index.html
+++ b/v0.1.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.0/_sources/getting_started/installing.rst.txt b/v0.1.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.0/_sources/getting_started/installing.rst.txt
+++ b/v0.1.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.0/_static/basic.css b/v0.1.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.0/_static/basic.css
+++ b/v0.1.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.0/_static/doctools.js b/v0.1.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.0/_static/doctools.js
+++ b/v0.1.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.0/_static/language_data.js b/v0.1.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.0/_static/language_data.js
+++ b/v0.1.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.0/_static/searchtools.js b/v0.1.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.0/_static/searchtools.js
+++ b/v0.1.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.0/changelog.html b/v0.1.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.0/changelog.html
+++ b/v0.1.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.0/community/resources.html b/v0.1.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.0/community/resources.html
+++ b/v0.1.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.0/contributing/code_of_conduct.html b/v0.1.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.0/contributing/code_of_conduct.html
+++ b/v0.1.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.0/contributing/contributing.html b/v0.1.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.0/contributing/contributing.html
+++ b/v0.1.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.0/genindex.html b/v0.1.0/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.0/genindex.html
+++ b/v0.1.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.0/getting_started/installing.html b/v0.1.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.0/getting_started/installing.html
+++ b/v0.1.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.0/index.html b/v0.1.0/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.0/index.html
+++ b/v0.1.0/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.0/modules/contrib.html b/v0.1.0/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.0/modules/contrib.html
+++ b/v0.1.0/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.0/modules/datasets.html b/v0.1.0/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.0/modules/datasets.html
+++ b/v0.1.0/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.0/modules/io.html b/v0.1.0/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.0/modules/io.html
+++ b/v0.1.0/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.0/modules/models.html b/v0.1.0/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.0/modules/models.html
+++ b/v0.1.0/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.0/modules/transforms.html b/v0.1.0/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.0/modules/transforms.html
+++ b/v0.1.0/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.0/modules/utils.html b/v0.1.0/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.0/modules/utils.html
+++ b/v0.1.0/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.0/notebooks.html b/v0.1.0/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.0/notebooks.html
+++ b/v0.1.0/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.0/search.html b/v0.1.0/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.0/search.html
+++ b/v0.1.0/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.0/searchindex.js b/v0.1.0/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.0/searchindex.js
+++ b/v0.1.0/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.0/using_doctr/custom_models_training.html b/v0.1.0/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.0/using_doctr/custom_models_training.html
+++ b/v0.1.0/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.0/using_doctr/running_on_aws.html b/v0.1.0/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.0/using_doctr/running_on_aws.html
+++ b/v0.1.0/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.0/using_doctr/sharing_models.html b/v0.1.0/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.0/using_doctr/sharing_models.html
+++ b/v0.1.0/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.0/using_doctr/using_contrib_modules.html b/v0.1.0/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.0/using_doctr/using_contrib_modules.html
+++ b/v0.1.0/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.0/using_doctr/using_datasets.html b/v0.1.0/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.0/using_doctr/using_datasets.html
+++ b/v0.1.0/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.0/using_doctr/using_model_export.html b/v0.1.0/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.0/using_doctr/using_model_export.html
+++ b/v0.1.0/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.0/using_doctr/using_models.html b/v0.1.0/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.0/using_doctr/using_models.html
+++ b/v0.1.0/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/cord.html b/v0.1.1/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.1/_modules/doctr/datasets/cord.html
+++ b/v0.1.1/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/detection.html b/v0.1.1/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.1/_modules/doctr/datasets/detection.html
+++ b/v0.1.1/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/funsd.html b/v0.1.1/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.1/_modules/doctr/datasets/funsd.html
+++ b/v0.1.1/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic03.html b/v0.1.1/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.1/_modules/doctr/datasets/ic03.html
+++ b/v0.1.1/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic13.html b/v0.1.1/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.1/_modules/doctr/datasets/ic13.html
+++ b/v0.1.1/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiit5k.html b/v0.1.1/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.1/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.1/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiithws.html b/v0.1.1/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.1/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.1/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/imgur5k.html b/v0.1.1/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.1/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.1/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/loader.html b/v0.1.1/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.1/_modules/doctr/datasets/loader.html
+++ b/v0.1.1/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/mjsynth.html b/v0.1.1/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.1/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.1/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ocr.html b/v0.1.1/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.1/_modules/doctr/datasets/ocr.html
+++ b/v0.1.1/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/recognition.html b/v0.1.1/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.1/_modules/doctr/datasets/recognition.html
+++ b/v0.1.1/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/sroie.html b/v0.1.1/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.1/_modules/doctr/datasets/sroie.html
+++ b/v0.1.1/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svhn.html b/v0.1.1/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.1/_modules/doctr/datasets/svhn.html
+++ b/v0.1.1/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svt.html b/v0.1.1/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.1/_modules/doctr/datasets/svt.html
+++ b/v0.1.1/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/synthtext.html b/v0.1.1/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.1/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.1/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/utils.html b/v0.1.1/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.1/_modules/doctr/datasets/utils.html
+++ b/v0.1.1/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/wildreceipt.html b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.1/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.1/_modules/doctr/io/elements.html b/v0.1.1/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.1/_modules/doctr/io/elements.html
+++ b/v0.1.1/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.1/_modules/doctr/io/html.html b/v0.1.1/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.1/_modules/doctr/io/html.html
+++ b/v0.1.1/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/base.html b/v0.1.1/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.1/_modules/doctr/io/image/base.html
+++ b/v0.1.1/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/tensorflow.html b/v0.1.1/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.1/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.1/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/io/pdf.html b/v0.1.1/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.1/_modules/doctr/io/pdf.html
+++ b/v0.1.1/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.1/_modules/doctr/io/reader.html b/v0.1.1/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.1/_modules/doctr/io/reader.html
+++ b/v0.1.1/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/zoo.html b/v0.1.1/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.1/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.1/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/zoo.html b/v0.1.1/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.1/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.1/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/factory/hub.html b/v0.1.1/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.1/_modules/doctr/models/factory/hub.html
+++ b/v0.1.1/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/zoo.html b/v0.1.1/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.1/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.1/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/zoo.html b/v0.1.1/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.1/_modules/doctr/models/zoo.html
+++ b/v0.1.1/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/base.html b/v0.1.1/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/utils/metrics.html b/v0.1.1/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.1/_modules/doctr/utils/metrics.html
+++ b/v0.1.1/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.1/_modules/doctr/utils/visualization.html b/v0.1.1/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.1/_modules/doctr/utils/visualization.html
+++ b/v0.1.1/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.1/_modules/index.html b/v0.1.1/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.1/_modules/index.html
+++ b/v0.1.1/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.1/_sources/getting_started/installing.rst.txt b/v0.1.1/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.1/_sources/getting_started/installing.rst.txt
+++ b/v0.1.1/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.1/_static/basic.css b/v0.1.1/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.1/_static/basic.css
+++ b/v0.1.1/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.1/_static/doctools.js b/v0.1.1/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.1/_static/doctools.js
+++ b/v0.1.1/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.1/_static/language_data.js b/v0.1.1/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.1/_static/language_data.js
+++ b/v0.1.1/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.1/_static/searchtools.js b/v0.1.1/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.1/_static/searchtools.js
+++ b/v0.1.1/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.1/changelog.html b/v0.1.1/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.1/changelog.html
+++ b/v0.1.1/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.1/community/resources.html b/v0.1.1/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.1/community/resources.html
+++ b/v0.1.1/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.1/contributing/code_of_conduct.html b/v0.1.1/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.1/contributing/code_of_conduct.html
+++ b/v0.1.1/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.1/contributing/contributing.html b/v0.1.1/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.1/contributing/contributing.html
+++ b/v0.1.1/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.1/genindex.html b/v0.1.1/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.1/genindex.html
+++ b/v0.1.1/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.1/getting_started/installing.html b/v0.1.1/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.1/getting_started/installing.html
+++ b/v0.1.1/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.1/index.html b/v0.1.1/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.1/index.html
+++ b/v0.1.1/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.1/modules/contrib.html b/v0.1.1/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.1/modules/contrib.html
+++ b/v0.1.1/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.1/modules/datasets.html b/v0.1.1/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.1/modules/datasets.html
+++ b/v0.1.1/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.1/modules/io.html b/v0.1.1/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.1/modules/io.html
+++ b/v0.1.1/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.1/modules/models.html b/v0.1.1/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.1/modules/models.html
+++ b/v0.1.1/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.1/modules/transforms.html b/v0.1.1/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.1/modules/transforms.html
+++ b/v0.1.1/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
Returns:¶
-
+
diff --git a/latest/modules/models.html b/latest/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/latest/modules/models.html
+++ b/latest/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/latest/modules/transforms.html b/latest/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/latest/modules/transforms.html
+++ b/latest/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/latest/modules/utils.html b/latest/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/latest/modules/utils.html
+++ b/latest/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/latest/notebooks.html b/latest/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/latest/notebooks.html
+++ b/latest/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/latest/search.html b/latest/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/latest/search.html
+++ b/latest/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/latest/searchindex.js b/latest/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/latest/searchindex.js
+++ b/latest/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/latest/using_doctr/custom_models_training.html b/latest/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/latest/using_doctr/custom_models_training.html
+++ b/latest/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/latest/using_doctr/running_on_aws.html b/latest/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/latest/using_doctr/running_on_aws.html
+++ b/latest/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/latest/using_doctr/sharing_models.html b/latest/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/latest/using_doctr/sharing_models.html
+++ b/latest/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/latest/using_doctr/using_contrib_modules.html b/latest/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/latest/using_doctr/using_contrib_modules.html
+++ b/latest/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/latest/using_doctr/using_datasets.html b/latest/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/latest/using_doctr/using_datasets.html
+++ b/latest/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/latest/using_doctr/using_model_export.html b/latest/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/latest/using_doctr/using_model_export.html
+++ b/latest/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/latest/using_doctr/using_models.html b/latest/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/latest/using_doctr/using_models.html
+++ b/latest/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/modules/contrib.html b/modules/contrib.html
index 22b0c508a6..b8878635b6 100644
--- a/modules/contrib.html
+++ b/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -376,7 +376,7 @@ Supported contribution modules
-
+
diff --git a/modules/datasets.html b/modules/datasets.html
index 0fe4b78d48..dfcacbc96e 100644
--- a/modules/datasets.html
+++ b/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1077,7 +1077,7 @@ Returns:
-
+
diff --git a/modules/io.html b/modules/io.html
index 924d292c59..77e9e017bf 100644
--- a/modules/io.html
+++ b/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -756,7 +756,7 @@ Returns:¶
-
+
diff --git a/modules/models.html b/modules/models.html
index bf45d11a71..f4a9833365 100644
--- a/modules/models.html
+++ b/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1598,7 +1598,7 @@ Args:¶
-
+
diff --git a/modules/transforms.html b/modules/transforms.html
index 6d77d16e7b..bc254c867b 100644
--- a/modules/transforms.html
+++ b/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -831,7 +831,7 @@ Args:¶<
-
+
diff --git a/modules/utils.html b/modules/utils.html
index 3dd3ecbd96..6784d81f6f 100644
--- a/modules/utils.html
+++ b/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -711,7 +711,7 @@ Args:¶
-
+
diff --git a/notebooks.html b/notebooks.html
index f3ea994e49..647f73d4eb 100644
--- a/notebooks.html
+++ b/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -387,7 +387,7 @@ docTR Notebooks
-
+
diff --git a/search.html b/search.html
index f0693e2c97..0e0da5efb3 100644
--- a/search.html
+++ b/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -336,7 +336,7 @@
-
+
diff --git a/searchindex.js b/searchindex.js
index 8598997441..df18967072 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[1, "correction"]], "2. Warning": [[1, "warning"]], "3. Temporary Ban": [[1, "temporary-ban"]], "4. Permanent Ban": [[1, "permanent-ban"]], "AWS Lambda": [[13, null]], "Advanced options": [[18, "advanced-options"]], "Args:": [[6, "args"], [6, "id4"], [6, "id7"], [6, "id10"], [6, "id13"], [6, "id16"], [6, "id19"], [6, "id22"], [6, "id25"], [6, "id29"], [6, "id32"], [6, "id37"], [6, "id40"], [6, "id46"], [6, "id49"], [6, "id50"], [6, "id51"], [6, "id54"], [6, "id57"], [6, "id60"], [6, "id61"], [7, "args"], [7, "id2"], [7, "id3"], [7, "id4"], [7, "id5"], [7, "id6"], [7, "id7"], [7, "id10"], [7, "id12"], [7, "id14"], [7, "id16"], [7, "id20"], [7, "id24"], [7, "id28"], [8, "args"], [8, "id3"], [8, "id8"], [8, "id13"], [8, "id17"], [8, "id21"], [8, "id26"], [8, "id31"], [8, "id36"], [8, "id41"], [8, "id46"], [8, "id50"], [8, "id54"], [8, "id59"], [8, "id63"], [8, "id68"], [8, "id73"], [8, "id77"], [8, "id81"], [8, "id85"], [8, "id90"], [8, "id95"], [8, "id99"], [8, "id104"], [8, "id109"], [8, "id114"], [8, "id119"], [8, "id123"], [8, "id127"], [8, "id132"], [8, "id137"], [8, "id142"], [8, "id146"], [8, "id150"], [8, "id155"], [8, "id159"], [8, "id163"], [8, "id167"], [8, "id169"], [8, "id171"], [8, "id173"], [9, "args"], [9, "id1"], [9, "id2"], [9, "id3"], [9, "id4"], [9, "id5"], [9, "id6"], [9, "id7"], [9, "id8"], [9, "id9"], [9, "id10"], [9, "id11"], [9, "id12"], [9, "id13"], [9, "id14"], [9, "id15"], [9, "id16"], [9, "id17"], [9, "id18"], [9, "id19"], [10, "args"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"]], "Artefact": [[7, "artefact"]], "ArtefactDetection": [[15, "artefactdetection"]], "Attribution": [[1, "attribution"]], "Available Datasets": [[16, "available-datasets"]], "Available architectures": [[18, "available-architectures"], [18, "id1"], [18, "id2"]], "Available contribution modules": [[15, "available-contribution-modules"]], "Block": [[7, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[16, null]], "Choosing the right model": [[18, null]], "Classification": [[14, "classification"]], "Code quality": [[2, "code-quality"]], "Code style verification": [[2, "code-style-verification"]], "Codebase structure": [[2, "codebase-structure"]], "Commits": [[2, "commits"]], "Composing transformations": [[9, "composing-transformations"]], "Continuous Integration": [[2, "continuous-integration"]], "Contributing to docTR": [[2, null]], "Contributor Covenant Code of Conduct": [[1, null]], "Custom dataset loader": [[6, "custom-dataset-loader"]], "Custom orientation classification models": [[12, "custom-orientation-classification-models"]], "Data Loading": [[16, "data-loading"]], "Dataloader": [[6, "dataloader"]], "Detection": [[14, "detection"], [16, "detection"]], "Detection predictors": [[18, "detection-predictors"]], "Developer mode installation": [[2, "developer-mode-installation"]], "Developing docTR": [[2, "developing-doctr"]], "Document": [[7, "document"]], "Document structure": [[7, "document-structure"]], "End-to-End OCR": [[18, "end-to-end-ocr"]], "Enforcement": [[1, "enforcement"]], "Enforcement Guidelines": [[1, "enforcement-guidelines"]], "Enforcement Responsibilities": [[1, "enforcement-responsibilities"]], "Export to ONNX": [[17, "export-to-onnx"]], "Feature requests & bug report": [[2, "feature-requests-bug-report"]], "Feedback": [[2, "feedback"]], "File reading": [[7, "file-reading"]], "Half-precision": [[17, "half-precision"]], "Installation": [[3, null]], "Integrate contributions into your pipeline": [[15, null]], "Let\u2019s connect": [[2, "let-s-connect"]], "Line": [[7, "line"]], "Loading from Huggingface Hub": [[14, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[12, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[12, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[4, "main-features"]], "Model optimization": [[17, "model-optimization"]], "Model zoo": [[4, "model-zoo"]], "Modifying the documentation": [[2, "modifying-the-documentation"]], "Naming conventions": [[14, "naming-conventions"]], "OCR": [[16, "ocr"]], "Object Detection": [[16, "object-detection"]], "Our Pledge": [[1, "our-pledge"]], "Our Standards": [[1, "our-standards"]], "Page": [[7, "page"]], "Preparing your model for inference": [[17, null]], "Prerequisites": [[3, "prerequisites"]], "Pretrained community models": [[14, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[14, "pushing-to-the-huggingface-hub"]], "Questions": [[2, "questions"]], "Recognition": [[14, "recognition"], [16, "recognition"]], "Recognition predictors": [[18, "recognition-predictors"]], "Returns:": [[6, "returns"], [7, "returns"], [7, "id11"], [7, "id13"], [7, "id15"], [7, "id19"], [7, "id23"], [7, "id27"], [7, "id31"], [8, "returns"], [8, "id6"], [8, "id11"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id29"], [8, "id34"], [8, "id39"], [8, "id44"], [8, "id49"], [8, "id53"], [8, "id57"], [8, "id62"], [8, "id66"], [8, "id71"], [8, "id76"], [8, "id80"], [8, "id84"], [8, "id88"], [8, "id93"], [8, "id98"], [8, "id102"], [8, "id107"], [8, "id112"], [8, "id117"], [8, "id122"], [8, "id126"], [8, "id130"], [8, "id135"], [8, "id140"], [8, "id145"], [8, "id149"], [8, "id153"], [8, "id158"], [8, "id162"], [8, "id166"], [8, "id168"], [8, "id170"], [8, "id172"], [10, "returns"]], "Scope": [[1, "scope"]], "Share your model with the community": [[14, null]], "Supported Vocabs": [[6, "supported-vocabs"]], "Supported contribution modules": [[5, "supported-contribution-modules"]], "Supported datasets": [[4, "supported-datasets"]], "Supported transformations": [[9, "supported-transformations"]], "Synthetic dataset generator": [[6, "synthetic-dataset-generator"], [16, "synthetic-dataset-generator"]], "Task evaluation": [[10, "task-evaluation"]], "Text Detection": [[18, "text-detection"]], "Text Recognition": [[18, "text-recognition"]], "Text detection models": [[4, "text-detection-models"]], "Text recognition models": [[4, "text-recognition-models"]], "Train your own model": [[12, null]], "Two-stage approaches": [[18, "two-stage-approaches"]], "Unit tests": [[2, "unit-tests"]], "Use your own datasets": [[16, "use-your-own-datasets"]], "Using your ONNX exported model": [[17, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[3, "via-conda-only-for-linux"]], "Via Git": [[3, "via-git"]], "Via Python Package": [[3, "via-python-package"]], "Visualization": [[10, "visualization"]], "What should I do with the output?": [[18, "what-should-i-do-with-the-output"]], "Word": [[7, "word"]], "docTR Notebooks": [[11, null]], "docTR Vocabs": [[6, "id62"]], "docTR: Document Text Recognition": [[4, null]], "doctr.contrib": [[5, null]], "doctr.datasets": [[6, null], [6, "datasets"]], "doctr.io": [[7, null]], "doctr.models": [[8, null]], "doctr.models.classification": [[8, "doctr-models-classification"]], "doctr.models.detection": [[8, "doctr-models-detection"]], "doctr.models.factory": [[8, "doctr-models-factory"]], "doctr.models.recognition": [[8, "doctr-models-recognition"]], "doctr.models.zoo": [[8, "doctr-models-zoo"]], "doctr.transforms": [[9, null]], "doctr.utils": [[10, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[7, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[7, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[9, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[6, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[9, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[9, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[6, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[6, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[8, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[6, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[6, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[7, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[7, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[6, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[6, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[9, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[9, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[6, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[6, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[6, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[6, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[6, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[8, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[9, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[7, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[6, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[9, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[8, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[6, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[9, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[7, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[9, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[9, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[9, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[9, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[9, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[9, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[9, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[9, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[9, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[9, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[9, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[9, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[7, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[7, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[7, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[6, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[9, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[7, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[7, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[6, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[6, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[6, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[6, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[9, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[10, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[6, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[7, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[6, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[6, 0, 1, "", "CORD"], [6, 0, 1, "", "CharacterGenerator"], [6, 0, 1, "", "DetectionDataset"], [6, 0, 1, "", "DocArtefacts"], [6, 0, 1, "", "FUNSD"], [6, 0, 1, "", "IC03"], [6, 0, 1, "", "IC13"], [6, 0, 1, "", "IIIT5K"], [6, 0, 1, "", "IIITHWS"], [6, 0, 1, "", "IMGUR5K"], [6, 0, 1, "", "MJSynth"], [6, 0, 1, "", "OCRDataset"], [6, 0, 1, "", "RecognitionDataset"], [6, 0, 1, "", "SROIE"], [6, 0, 1, "", "SVHN"], [6, 0, 1, "", "SVT"], [6, 0, 1, "", "SynthText"], [6, 0, 1, "", "WILDRECEIPT"], [6, 0, 1, "", "WordGenerator"], [6, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[6, 0, 1, "", "DataLoader"]], "doctr.io": [[7, 0, 1, "", "Artefact"], [7, 0, 1, "", "Block"], [7, 0, 1, "", "Document"], [7, 0, 1, "", "DocumentFile"], [7, 0, 1, "", "Line"], [7, 0, 1, "", "Page"], [7, 0, 1, "", "Word"], [7, 1, 1, "", "decode_img_as_tensor"], [7, 1, 1, "", "read_html"], [7, 1, 1, "", "read_img_as_numpy"], [7, 1, 1, "", "read_img_as_tensor"], [7, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[7, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[7, 2, 1, "", "from_images"], [7, 2, 1, "", "from_pdf"], [7, 2, 1, "", "from_url"]], "doctr.io.Page": [[7, 2, 1, "", "show"]], "doctr.models": [[8, 1, 1, "", "kie_predictor"], [8, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[8, 1, 1, "", "crop_orientation_predictor"], [8, 1, 1, "", "magc_resnet31"], [8, 1, 1, "", "mobilenet_v3_large"], [8, 1, 1, "", "mobilenet_v3_large_r"], [8, 1, 1, "", "mobilenet_v3_small"], [8, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [8, 1, 1, "", "mobilenet_v3_small_page_orientation"], [8, 1, 1, "", "mobilenet_v3_small_r"], [8, 1, 1, "", "page_orientation_predictor"], [8, 1, 1, "", "resnet18"], [8, 1, 1, "", "resnet31"], [8, 1, 1, "", "resnet34"], [8, 1, 1, "", "resnet50"], [8, 1, 1, "", "textnet_base"], [8, 1, 1, "", "textnet_small"], [8, 1, 1, "", "textnet_tiny"], [8, 1, 1, "", "vgg16_bn_r"], [8, 1, 1, "", "vit_b"], [8, 1, 1, "", "vit_s"]], "doctr.models.detection": [[8, 1, 1, "", "db_mobilenet_v3_large"], [8, 1, 1, "", "db_resnet50"], [8, 1, 1, "", "detection_predictor"], [8, 1, 1, "", "fast_base"], [8, 1, 1, "", "fast_small"], [8, 1, 1, "", "fast_tiny"], [8, 1, 1, "", "linknet_resnet18"], [8, 1, 1, "", "linknet_resnet34"], [8, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[8, 1, 1, "", "from_hub"], [8, 1, 1, "", "login_to_hub"], [8, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[8, 1, 1, "", "crnn_mobilenet_v3_large"], [8, 1, 1, "", "crnn_mobilenet_v3_small"], [8, 1, 1, "", "crnn_vgg16_bn"], [8, 1, 1, "", "master"], [8, 1, 1, "", "parseq"], [8, 1, 1, "", "recognition_predictor"], [8, 1, 1, "", "sar_resnet31"], [8, 1, 1, "", "vitstr_base"], [8, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[9, 0, 1, "", "ChannelShuffle"], [9, 0, 1, "", "ColorInversion"], [9, 0, 1, "", "Compose"], [9, 0, 1, "", "GaussianBlur"], [9, 0, 1, "", "GaussianNoise"], [9, 0, 1, "", "LambdaTransformation"], [9, 0, 1, "", "Normalize"], [9, 0, 1, "", "OneOf"], [9, 0, 1, "", "RandomApply"], [9, 0, 1, "", "RandomBrightness"], [9, 0, 1, "", "RandomContrast"], [9, 0, 1, "", "RandomCrop"], [9, 0, 1, "", "RandomGamma"], [9, 0, 1, "", "RandomHorizontalFlip"], [9, 0, 1, "", "RandomHue"], [9, 0, 1, "", "RandomJpegQuality"], [9, 0, 1, "", "RandomResize"], [9, 0, 1, "", "RandomRotate"], [9, 0, 1, "", "RandomSaturation"], [9, 0, 1, "", "RandomShadow"], [9, 0, 1, "", "Resize"], [9, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[10, 0, 1, "", "DetectionMetric"], [10, 0, 1, "", "LocalizationConfusion"], [10, 0, 1, "", "OCRMetric"], [10, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.visualization": [[10, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [1, 7, 8, 10, 14, 17], "0": [1, 3, 6, 9, 10, 12, 15, 16, 18], "00": 18, "01": 18, "0123456789": 6, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "02562": 8, "03": 18, "035": 18, "0361328125": 18, "04": 18, "05": 18, "06": 18, "06640625": 18, "07": 18, "08": [9, 18], "09": 18, "0966796875": 18, "1": [6, 7, 8, 9, 10, 12, 16, 18], "10": [6, 10, 18], "100": [6, 9, 10, 16, 18], "1000": 18, "101": 6, "1024": [8, 12, 18], "104": 6, "106": 6, "108": 6, "1095": 16, "11": 18, "110": 10, "1107": 16, "114": 6, "115": 6, "1156": 16, "116": 6, "118": 6, "11800h": 18, "11th": 18, "12": 18, "120": 6, "123": 6, "126": 6, "1268": 16, "128": [8, 12, 17, 18], "13": 18, "130": 6, "13068": 16, "131": 6, "1337891": 16, "1357421875": 18, "1396484375": 18, "14": 18, "1420": 18, "14470v1": 6, "149": 16, "15": 18, "150": [10, 18], "1552": 18, "16": [8, 17, 18], "1630859375": 18, "1684": 18, "16x16": 8, "17": 18, "1778": 18, "1782": 18, "18": [8, 18], "185546875": 18, "1900": 18, "1910": 8, "19342": 16, "19370": 16, "195": 6, "19598": 16, "199": 18, "1999": 18, "2": [3, 4, 6, 7, 9, 15, 18], "20": 18, "200": 10, "2000": 16, "2003": [4, 6], "2012": 6, "2013": [4, 6], "2015": 6, "2019": 4, "207901": 16, "21": 18, "2103": 6, "2186": 16, "21888": 16, "22": 18, "224": [8, 9], "225": 9, "22672": 16, "229": [9, 16], "23": 18, "233": 16, "236": 6, "24": 18, "246": 16, "249": 16, "25": 18, "2504": 18, "255": [7, 8, 9, 10, 18], "256": 8, "257": 16, "26": 18, "26032": 16, "264": 12, "27": 18, "2700": 16, "2710": 18, "2749": 12, "28": 18, "287": 12, "29": 18, "296": 12, "299": 12, "2d": 18, "3": [3, 4, 7, 8, 9, 10, 17, 18], "30": 18, "300": 16, "3000": 16, "301": 12, "30595": 18, "30ghz": 18, "31": 8, "32": [6, 8, 9, 12, 16, 17, 18], "3232421875": 18, "33": [9, 18], "33402": 16, "33608": 16, "34": [8, 18], "340": 18, "3456": 18, "3515625": 18, "36": 18, "360": 16, "37": [6, 18], "38": 18, "39": 18, "4": [8, 9, 10, 18], "40": 18, "406": 9, "41": 18, "42": 18, "43": 18, "44": 18, "45": 18, "456": 9, "46": 18, "47": 18, "472": 16, "48": [6, 18], "485": 9, "49": 18, "49377": 16, "5": [6, 9, 10, 15, 18], "50": [8, 16, 18], "51": 18, "51171875": 18, "512": 8, "52": [6, 18], "529": 18, "53": 18, "54": 18, "540": 18, "5478515625": 18, "55": 18, "56": 18, "57": 18, "58": [6, 18], "580": 18, "5810546875": 18, "583": 18, "59": 18, "597": 18, "5k": [4, 6], "5m": 18, "6": [9, 18], "60": 9, "600": [8, 10, 18], "61": 18, "62": 18, "626": 16, "63": 18, "64": [8, 9, 18], "641": 18, "647": 16, "65": 18, "66": 18, "67": 18, "68": 18, "69": 18, "693": 12, "694": 12, "695": 12, "6m": 18, "7": 18, "70": [6, 10, 18], "707470": 16, "71": [6, 18], "7100000": 16, "7141797": 16, "7149": 16, "72": 18, "72dpi": 7, "73": 18, "73257": 16, "74": 18, "75": [9, 18], "7581382": 16, "76": 18, "77": 18, "772": 12, "772875": 16, "78": 18, "785": 12, "79": 18, "793533": 16, "796": 16, "798": 12, "7m": 18, "8": [8, 9, 18], "80": 18, "800": [8, 10, 16, 18], "81": 18, "82": 18, "83": 18, "84": 18, "849": 16, "85": 18, "8564453125": 18, "857": 18, "85875": 16, "86": 18, "8603515625": 18, "87": 18, "8707": 16, "88": 18, "89": 18, "9": [3, 9, 18], "90": 18, "90k": 6, "90kdict32px": 6, "91": 18, "914085328578949": 18, "92": 18, "93": 18, "94": [6, 18], "95": [10, 18], "9578408598899841": 18, "96": 18, "97": 18, "98": 18, "99": 18, "9949972033500671": 18, "A": [1, 2, 4, 6, 7, 8, 11, 17], "As": 2, "Be": 18, "Being": 1, "By": 13, "For": [1, 2, 3, 12, 18], "If": [2, 7, 8, 12, 18], "In": [2, 6, 16], "It": [9, 14, 15, 17], "Its": [4, 8], "No": [1, 18], "Of": 6, "Or": [15, 17], "The": [1, 2, 6, 7, 10, 13, 15, 16, 17, 18], "Then": 8, "To": [2, 3, 13, 14, 15, 17, 18], "_": [1, 6, 8], "__call__": 18, "_build": 2, "_i": 10, "ab": 6, "abc": 17, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "abdef": [6, 16], "abl": [16, 18], "about": [1, 16, 18], "abov": 18, "abstractdataset": 6, "abus": 1, "accept": 1, "access": [4, 7, 16, 18], "account": [1, 14], "accur": 18, "accuraci": 10, "achiev": 17, "act": 1, "action": 1, "activ": 4, "ad": [2, 8, 9], "adapt": 1, "add": [9, 10, 14, 18], "add_hook": 18, "add_label": 10, "addit": [2, 3, 7, 15, 18], "addition": [2, 18], "address": [1, 7], "adjust": 9, "advanc": 1, "advantag": 17, "advis": 2, "aesthet": [4, 6], "affect": 1, "after": [14, 18], "ag": 1, "again": 8, "aggreg": [10, 16], "aggress": 1, "align": [1, 7, 9], "all": [1, 2, 5, 6, 7, 9, 10, 15, 16, 18], "allow": [1, 17], "along": 18, "alreadi": [2, 17], "also": [1, 8, 14, 15, 16, 18], "alwai": 16, "an": [1, 2, 4, 6, 7, 8, 10, 15, 17, 18], "analysi": [7, 15], "ancient_greek": 6, "angl": [7, 9], "ani": [1, 6, 7, 8, 9, 10, 17, 18], "annot": 6, "anot": 16, "anoth": [8, 12, 16], "answer": 1, "anyascii": 10, "anyon": 4, "anyth": 15, "api": [2, 4], "apolog": 1, "apologi": 1, "app": 2, "appear": 1, "appli": [1, 6, 9], "applic": [4, 8], "appoint": 1, "appreci": 14, "appropri": [1, 2, 18], "ar": [1, 2, 3, 5, 6, 7, 9, 10, 11, 15, 16, 18], "arab": 6, "arabic_diacrit": 6, "arabic_lett": 6, "arabic_punctu": 6, "arbitrarili": [4, 8], "arch": [8, 14], "architectur": [4, 8, 14, 15], "area": 18, "argument": [6, 7, 8, 10, 12, 18], "around": 1, "arrai": [7, 9, 10], "art": [4, 15], "artefact": [10, 15, 18], "artefact_typ": 7, "artifici": [4, 6], "arxiv": [6, 8], "asarrai": 10, "ascii_lett": 6, "aspect": [4, 8, 9, 18], "assess": 10, "assign": 10, "associ": 7, "assum": 8, "assume_straight_pag": [8, 12, 18], "astyp": [8, 10, 18], "attack": 1, "attend": [4, 8], "attent": [1, 8], "autom": 4, "automat": 18, "autoregress": [4, 8], "avail": [1, 4, 5, 9], "averag": [9, 18], "avoid": [1, 3], "aw": [4, 18], "awar": 18, "azur": 18, "b": [8, 10, 18], "b_j": 10, "back": 2, "backbon": 8, "backend": 18, "background": 16, "bangla": 6, "bar": 15, "bar_cod": 16, "base": [4, 8, 15], "baselin": [4, 8, 18], "batch": [6, 8, 9, 15, 16, 18], "batch_siz": [6, 12, 15, 16, 17], "bblanchon": 3, "bbox": 18, "becaus": 13, "been": [2, 10, 16, 18], "befor": [6, 8, 9, 18], "begin": 10, "behavior": [1, 18], "being": [10, 18], "belong": 18, "benchmark": 18, "best": 1, "better": [11, 18], "between": [9, 10, 18], "bgr": 7, "bilinear": 9, "bin_thresh": 18, "binar": [4, 8, 18], "binari": [7, 17, 18], "bit": 17, "block": [10, 18], "block_1_1": 18, "blur": 9, "bmvc": 6, "bn": 14, "bodi": [1, 18], "bool": [6, 7, 8, 9, 10], "boolean": [8, 18], "both": [4, 6, 9, 16, 18], "bottom": [8, 18], "bound": [6, 7, 8, 9, 10, 15, 16, 18], "box": [6, 7, 8, 9, 10, 15, 16, 18], "box_thresh": 18, "bright": 9, "browser": [2, 4], "build": [2, 3, 17], "built": 2, "byte": [7, 18], "c": [3, 7, 10], "c_j": 10, "cach": [2, 6, 13], "cache_sampl": 6, "call": 17, "callabl": [6, 9], "can": [2, 3, 12, 13, 14, 15, 16, 18], "capabl": [2, 11, 18], "case": [6, 10], "cf": 18, "cfg": 18, "challeng": 6, "challenge2_test_task12_imag": 6, "challenge2_test_task1_gt": 6, "challenge2_training_task12_imag": 6, "challenge2_training_task1_gt": 6, "chang": [13, 18], "channel": [1, 2, 7, 9], "channel_prior": 3, "channelshuffl": 9, "charact": [4, 6, 7, 10, 16, 18], "charactergener": [6, 16], "characterist": 1, "charg": 18, "charset": 18, "chart": 7, "check": [2, 14, 18], "checkpoint": 8, "chip": 3, "ci": 2, "clarifi": 1, "clariti": 1, "class": [1, 6, 7, 9, 10, 18], "class_nam": 12, "classif": [16, 18], "classmethod": 7, "clear": 2, "clone": 3, "close": 2, "co": 14, "code": [4, 7, 15], "codecov": 2, "colab": 11, "collate_fn": 6, "collect": [7, 15], "color": 9, "colorinvers": 9, "column": 7, "com": [1, 3, 7, 8, 14], "combin": 18, "command": [2, 15], "comment": 1, "commit": 1, "common": [1, 9, 10, 17], "commun": 1, "compar": 4, "comparison": [10, 18], "competit": 6, "compil": [11, 18], "complaint": 1, "complementari": 10, "complet": 2, "compon": 18, "compos": [6, 18], "comprehens": 18, "comput": [6, 10, 17, 18], "conf_threshold": 15, "confid": [7, 18], "config": [3, 8], "configur": 8, "confus": 10, "consecut": [9, 18], "consequ": 1, "consid": [1, 2, 6, 7, 10, 18], "consist": 18, "consolid": [4, 6], "constant": 9, "construct": 1, "contact": 1, "contain": [5, 6, 11, 16, 18], "content": [6, 7, 18], "context": 8, "contib": 3, "continu": 1, "contrast": 9, "contrast_factor": 9, "contrib": [3, 15], "contribut": 1, "contributor": 2, "convers": 7, "convert": [7, 9], "convolut": 8, "coordin": [7, 18], "cord": [4, 6, 16, 18], "core": [10, 18], "corner": 18, "correct": 9, "correspond": [3, 7, 9, 18], "could": [1, 15], "counterpart": 10, "cover": 2, "coverag": 2, "cpu": [4, 12, 17], "creat": 14, "crnn": [4, 8, 14], "crnn_mobilenet_v3_larg": [8, 14, 18], "crnn_mobilenet_v3_smal": [8, 17, 18], "crnn_vgg16_bn": [8, 12, 14, 18], "crop": [7, 8, 9, 12, 16, 18], "crop_orient": [7, 18], "crop_orientation_predictor": [8, 12], "crop_param": 12, "cuda": 17, "currenc": 6, "current": [2, 12, 18], "custom": [14, 15, 17, 18], "custom_crop_orientation_model": 12, "custom_page_orientation_model": 12, "customhook": 18, "cvit": 4, "czczup": 8, "czech": 6, "d": [6, 16], "danish": 6, "data": [4, 6, 7, 9, 10, 12, 14], "dataload": 16, "dataset": [8, 12, 18], "dataset_info": 6, "date": [12, 18], "db": 14, "db_mobilenet_v3_larg": [8, 14, 18], "db_resnet34": 18, "db_resnet50": [8, 12, 14, 18], "dbnet": [4, 8], "deal": [11, 18], "decis": 1, "decod": 7, "decode_img_as_tensor": 7, "dedic": 17, "deem": 1, "deep": [8, 18], "def": 18, "default": [3, 7, 12, 13, 18], "defer": 16, "defin": [10, 17], "degre": [7, 9, 18], "degress": 7, "delet": 2, "delimit": 18, "delta": 9, "demo": [2, 4], "demonstr": 1, "depend": [2, 3, 4, 18], "deploi": 2, "deploy": 4, "derogatori": 1, "describ": 8, "descript": 11, "design": 9, "desir": 7, "det_arch": [8, 12, 14, 17], "det_b": 18, "det_model": [12, 14, 17], "det_param": 12, "det_predictor": [12, 18], "detail": [12, 18], "detect": [6, 7, 10, 11, 12, 15], "detect_languag": 8, "detect_orient": [8, 12, 18], "detection_predictor": [8, 18], "detection_task": [6, 16], "detectiondataset": [6, 16], "detectionmetr": 10, "detectionpredictor": [8, 12], "detector": [4, 8, 15], "deterior": 8, "determin": 1, "dev": [2, 13], "develop": 3, "deviat": 9, "devic": 17, "dict": [7, 10, 18], "dictionari": [7, 10], "differ": 1, "differenti": [4, 8], "digit": [4, 6, 16], "dimens": [7, 10, 18], "dimension": 9, "direct": 6, "directli": [14, 18], "directori": [2, 13], "disabl": [1, 13, 18], "disable_crop_orient": 18, "disable_page_orient": 18, "disclaim": 18, "discuss": 2, "disparag": 1, "displai": [7, 10], "display_artefact": 10, "distribut": 9, "div": 18, "divers": 1, "divid": 7, "do": [2, 3, 8], "doc": [2, 7, 15, 17, 18], "docartefact": [6, 16], "docstr": 2, "doctr": [3, 12, 13, 14, 15, 16, 17, 18], "doctr_cache_dir": 13, "doctr_multiprocessing_dis": 13, "document": [6, 8, 10, 11, 12, 15, 16, 17, 18], "documentbuild": 18, "documentfil": [7, 12, 14, 15, 17], "doesn": 17, "don": [12, 18], "done": 9, "download": [6, 16], "downsiz": 8, "draw": 9, "drop": 6, "drop_last": 6, "dtype": [7, 8, 9, 10, 17], "dual": [4, 6], "dummi": 14, "dummy_img": 18, "dummy_input": 17, "dure": 1, "dutch": 6, "dynam": [6, 15], "dynamic_seq_length": 6, "e": [1, 2, 3, 7, 8], "each": [4, 6, 7, 8, 9, 10, 16, 18], "eas": 2, "easi": [4, 10, 14, 17], "easili": [7, 10, 12, 14, 16, 18], "econom": 1, "edit": 1, "educ": 1, "effect": 18, "effici": [2, 4, 6, 8], "either": [10, 18], "element": [6, 7, 8, 18], "els": [2, 15], "email": 1, "empathi": 1, "en": 18, "enabl": [6, 7], "enclos": 7, "encod": [4, 6, 7, 8, 18], "encode_sequ": 6, "encount": 2, "encrypt": 7, "end": [4, 6, 8, 10], "english": [6, 16], "enough": [2, 18], "ensur": 2, "entri": 6, "environ": [1, 13], "eo": 6, "equiv": 18, "estim": 8, "etc": [7, 15], "ethnic": 1, "evalu": [16, 18], "event": 1, "everyon": 1, "everyth": [2, 18], "exact": [10, 18], "exampl": [1, 2, 4, 6, 8, 14, 18], "exchang": 17, "execut": 18, "exist": 14, "expand": 9, "expect": [7, 9, 10], "experi": 1, "explan": [1, 18], "explicit": 1, "exploit": [4, 8], "export": [7, 8, 10, 11, 15, 18], "export_as_straight_box": [8, 18], "export_as_xml": 18, "export_model_to_onnx": 17, "express": [1, 9], "extens": 7, "extern": [1, 16], "extract": [4, 6], "extractor": 8, "f_": 10, "f_a": 10, "factor": 9, "fair": 1, "fairli": 1, "fals": [6, 7, 8, 9, 10, 12, 18], "faq": 1, "fascan": 14, "fast": [4, 6, 8], "fast_bas": [8, 18], "fast_smal": [8, 18], "fast_tini": [8, 18], "faster": [4, 8, 17], "fasterrcnn_mobilenet_v3_large_fpn": 8, "favorit": 18, "featur": [3, 8, 10, 11, 12, 15], "feedback": 1, "feel": [2, 14], "felix92": 14, "few": [17, 18], "figsiz": 10, "figur": [10, 15], "file": [2, 6], "final": 8, "find": [2, 16], "finnish": 6, "first": [2, 6], "firsthand": 6, "fit": [8, 18], "flag": 18, "flip": 9, "float": [7, 9, 10, 17], "float32": [7, 8, 9, 17], "fn": 9, "focu": 14, "focus": [1, 6], "folder": 6, "follow": [1, 2, 3, 6, 9, 10, 12, 13, 14, 15, 18], "font": 6, "font_famili": 6, "foral": 10, "forc": 2, "forg": 3, "form": [4, 6, 18], "format": [7, 10, 12, 16, 17, 18], "forpost": [4, 6], "forum": 2, "fp16": 17, "frac": 10, "framework": [3, 14, 16, 18], "free": [1, 2, 14], "french": [6, 12, 14, 18], "friendli": 4, "from": [1, 4, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18], "from_hub": [8, 14], "from_imag": [7, 14, 15, 17], "from_pdf": 7, "from_url": 7, "full": [6, 10, 18], "function": [6, 9, 10, 15], "funsd": [4, 6, 16, 18], "further": 16, "futur": 6, "g": [7, 8], "g_": 10, "g_x": 10, "gamma": 9, "gaussian": 9, "gaussianblur": 9, "gaussiannois": 9, "gen": 18, "gender": 1, "gener": [2, 4, 7, 8], "generic_cyrillic_lett": 6, "geometri": [4, 7, 18], "geq": 10, "german": [6, 12, 14], "get": [17, 18], "git": 14, "github": [2, 3, 8, 14], "give": [1, 15], "given": [6, 7, 9, 10, 18], "global": 8, "go": 18, "good": 17, "googl": 2, "googlevis": 4, "gpu": [4, 15, 17], "gracefulli": 1, "graph": [4, 6, 7], "grayscal": 9, "ground": 10, "groung": 10, "group": [4, 18], "gt": 10, "gt_box": 10, "gt_label": 10, "guid": 2, "guidanc": 16, "gvision": 18, "h": [7, 8, 9], "h_": 10, "ha": [2, 6, 10, 16], "handl": [11, 16, 18], "handwrit": 6, "handwritten": 16, "harass": 1, "hardwar": 18, "harm": 1, "hat": 10, "have": [1, 2, 10, 12, 14, 16, 17, 18], "head": [8, 18], "healthi": 1, "hebrew": 6, "height": [7, 9], "hello": [10, 18], "help": 17, "here": [5, 9, 11, 15, 16, 18], "hf": 8, "hf_hub_download": 8, "high": 7, "higher": [3, 6, 18], "hindi": 6, "hindi_digit": 6, "hocr": 18, "hook": 18, "horizont": [7, 9, 18], "hous": 6, "how": [2, 11, 12, 14, 16], "howev": 16, "hsv": 9, "html": [1, 2, 3, 7, 18], "http": [1, 3, 6, 7, 8, 14, 18], "hub": 8, "hue": 9, "huggingfac": 8, "hw": 6, "i": [1, 2, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17], "i7": 18, "ic03": [4, 6, 16], "ic13": [4, 6, 16], "icdar": [4, 6], "icdar2019": 6, "id": 18, "ident": 1, "identifi": 4, "iiit": [4, 6], "iiit5k": [6, 16], "iiithw": [4, 6, 16], "imag": [4, 6, 7, 8, 9, 10, 14, 15, 16, 18], "imagenet": 8, "imageri": 1, "images_90k_norm": 6, "img": [6, 9, 16, 17], "img_cont": 7, "img_fold": [6, 16], "img_path": 7, "img_transform": 6, "imgur5k": [4, 6, 16], "imgur5k_annot": 6, "imlist": 6, "impact": 1, "implement": [6, 7, 8, 9, 10, 18], "import": [6, 7, 8, 9, 10, 12, 14, 15, 16, 17, 18], "improv": 8, "inappropri": 1, "incid": 1, "includ": [1, 6, 16, 17], "inclus": 1, "increas": 9, "independ": 9, "index": [2, 7], "indic": 10, "individu": 1, "infer": [4, 8, 9, 15, 18], "inform": [1, 2, 4, 6, 16], "input": [2, 7, 8, 9, 17, 18], "input_crop": 8, "input_pag": [8, 10, 18], "input_shap": 17, "input_tensor": 8, "inspir": [1, 9], "instal": [14, 15, 17], "instanc": [1, 18], "instanti": [8, 18], "instead": [6, 7, 8], "insult": 1, "int": [6, 7, 9], "int64": 10, "integ": 10, "integr": [4, 14, 16], "intel": 18, "interact": [1, 7, 10], "interfac": [14, 17], "interoper": 17, "interpol": 9, "interpret": [6, 7], "intersect": 10, "invert": 9, "investig": 1, "invis": 1, "involv": [1, 18], "io": [12, 14, 15, 17], "iou": 10, "iou_thresh": 10, "iou_threshold": 15, "irregular": [4, 8, 16], "isn": 6, "issu": [1, 2, 14], "italian": 6, "iter": [6, 9, 16, 18], "its": [7, 8, 9, 10, 16, 18], "itself": [8, 14], "j": 10, "job": 2, "join": 2, "jpeg": 9, "jpegqual": 9, "jpg": [6, 7, 14, 17], "json": [6, 16, 18], "json_output": 18, "jump": 2, "just": 1, "kei": [4, 6], "kera": [8, 17], "kernel": [4, 8, 9], "kernel_shap": 9, "keywoard": 8, "keyword": [6, 7, 8, 10], "kie": [8, 12], "kie_predictor": [8, 12], "kiepredictor": 8, "kind": 1, "know": [2, 17], "kwarg": [6, 7, 8, 10], "l": 10, "l_j": 10, "label": [6, 10, 15, 16], "label_fil": [6, 16], "label_fold": 6, "label_path": [6, 16], "labels_path": [6, 16], "ladder": 1, "lambda": 9, "lambdatransform": 9, "lang": 18, "languag": [1, 4, 6, 7, 8, 14, 18], "larg": [8, 14], "largest": 10, "last": [3, 6], "latenc": 8, "later": 2, "latest": 18, "latin": 6, "layer": 17, "layout": 18, "lead": 1, "leader": 1, "learn": [1, 4, 8, 17, 18], "least": 3, "left": [10, 18], "legacy_french": 6, "length": [6, 18], "less": [17, 18], "level": [1, 6, 10, 18], "leverag": 11, "lf": 14, "librari": [2, 3, 11, 12], "light": 4, "lightweight": 17, "like": 1, "limits_": 10, "line": [4, 8, 10, 18], "line_1_1": 18, "link": 12, "linknet": [4, 8], "linknet_resnet18": [8, 12, 17, 18], "linknet_resnet34": [8, 17, 18], "linknet_resnet50": [8, 18], "list": [6, 7, 9, 10, 14], "ll": 10, "load": [4, 6, 8, 15, 17], "load_state_dict": 12, "load_weight": 12, "loc_pr": 18, "local": [2, 4, 6, 8, 10, 16, 18], "localis": 6, "localizationconfus": 10, "locat": [2, 7, 18], "login": 8, "login_to_hub": [8, 14], "logo": [7, 15, 16], "love": 14, "lower": [9, 10, 18], "m": [2, 10, 18], "m1": 3, "macbook": 3, "machin": 17, "made": 4, "magc_resnet31": 8, "mai": [1, 2], "mail": 1, "main": 11, "maintain": 4, "mainten": 2, "make": [1, 2, 10, 12, 13, 14, 17, 18], "mani": [16, 18], "manipul": 18, "map": [6, 8], "map_loc": 12, "master": [4, 8, 18], "match": [10, 18], "mathcal": 10, "matplotlib": [7, 10], "max": [6, 9, 10], "max_angl": 9, "max_area": 9, "max_char": [6, 16], "max_delta": 9, "max_gain": 9, "max_gamma": 9, "max_qual": 9, "max_ratio": 9, "maximum": [6, 9], "maxval": [8, 9], "mbox": 10, "mean": [9, 10, 12], "meaniou": 10, "meant": [7, 17], "measur": 18, "media": 1, "median": 8, "meet": 12, "member": 1, "memori": [13, 17], "mention": 18, "merg": 6, "messag": 2, "meta": 18, "metadata": 17, "metal": 3, "method": [7, 9, 18], "metric": [10, 18], "middl": 18, "might": [17, 18], "min": 9, "min_area": 9, "min_char": [6, 16], "min_gain": 9, "min_gamma": 9, "min_qual": 9, "min_ratio": 9, "min_val": 9, "minde": [1, 3, 4, 8], "minim": [2, 4], "minimalist": [4, 8], "minimum": [3, 6, 9, 10, 18], "minval": 9, "miss": 3, "mistak": 1, "mixed_float16": 17, "mixed_precis": 17, "mjsynth": [4, 6, 16], "mnt": 6, "mobilenet": [8, 14], "mobilenet_v3_larg": 8, "mobilenet_v3_large_r": 8, "mobilenet_v3_smal": [8, 12], "mobilenet_v3_small_crop_orient": [8, 12], "mobilenet_v3_small_page_orient": [8, 12], "mobilenet_v3_small_r": 8, "mobilenetv3": 8, "modal": [4, 6], "mode": 3, "model": [6, 10, 13, 15, 16], "model_nam": [8, 14, 17], "model_path": [15, 17], "moder": 1, "modif": 2, "modifi": [8, 13, 18], "modul": [3, 7, 8, 9, 10, 18], "more": [2, 16, 18], "most": 18, "mozilla": 1, "multi": [4, 8], "multilingu": [6, 14], "multipl": [6, 7, 9, 18], "multipli": 9, "multiprocess": 13, "my": 8, "my_awesome_model": 14, "my_hook": 18, "n": [6, 10], "name": [6, 8, 17, 18], "nation": 1, "natur": [1, 4, 6], "ndarrai": [6, 7, 9, 10], "necessari": [3, 12, 13], "need": [2, 3, 6, 10, 12, 13, 14, 15, 18], "neg": 9, "nest": 18, "network": [4, 6, 8, 17], "neural": [4, 6, 8, 17], "new": [2, 10], "next": [6, 16], "nois": 9, "noisi": [4, 6], "non": [4, 6, 7, 8, 9, 10], "none": [6, 7, 8, 9, 10, 18], "normal": [8, 9], "norwegian": 6, "note": [0, 2, 6, 8, 12, 14, 15, 17], "now": 2, "np": [8, 9, 10, 18], "num_output_channel": 9, "num_sampl": [6, 16], "number": [6, 9, 10, 18], "numpi": [7, 8, 10, 18], "o": 3, "obb": 15, "obj_detect": 14, "object": [6, 7, 10, 15, 18], "objectness_scor": [7, 18], "oblig": 1, "obtain": 18, "occupi": 17, "ocr": [4, 6, 8, 10, 14], "ocr_carea": 18, "ocr_db_crnn": 10, "ocr_lin": 18, "ocr_pag": 18, "ocr_par": 18, "ocr_predictor": [8, 12, 14, 17, 18], "ocrdataset": [6, 16], "ocrmetr": 10, "ocrpredictor": [8, 12], "ocrx_word": 18, "offens": 1, "offici": [1, 8], "offlin": 1, "offset": 9, "onc": 18, "one": [2, 6, 8, 9, 12, 14, 18], "oneof": 9, "ones": [6, 10], "onli": [2, 8, 9, 10, 12, 14, 16, 17, 18], "onlin": 1, "onnx": 15, "onnxruntim": [15, 17], "onnxtr": 17, "opac": 9, "opacity_rang": 9, "open": [1, 2, 14, 17], "opinion": 1, "optic": [4, 18], "optim": [4, 18], "option": [6, 8, 12], "order": [2, 6, 7, 9], "org": [1, 6, 8, 18], "organ": 7, "orient": [1, 7, 8, 11, 15, 18], "orientationpredictor": 8, "other": [1, 2], "otherwis": [1, 7, 10], "our": [2, 8, 18], "out": [2, 8, 9, 10, 18], "outpout": 18, "output": [7, 9, 17], "output_s": [7, 9], "outsid": 13, "over": [6, 10, 18], "overal": [1, 8], "overlai": 7, "overview": 15, "overwrit": 12, "overwritten": 14, "own": 4, "p": [9, 18], "packag": [2, 4, 10, 13, 15, 16, 17], "pad": [6, 8, 9, 18], "page": [3, 6, 8, 10, 12, 18], "page1": 7, "page2": 7, "page_1": 18, "page_idx": [7, 18], "page_orientation_predictor": [8, 12], "page_param": 12, "pair": 10, "paper": 8, "par_1_1": 18, "paragraph": 18, "paragraph_break": 18, "param": [9, 18], "paramet": [4, 7, 8, 17], "pars": [4, 6], "parseq": [4, 8, 14, 17, 18], "part": [6, 9, 18], "parti": 3, "partial": 18, "particip": 1, "pass": [6, 7, 8, 12, 18], "password": 7, "patch": [8, 10], "path": [6, 7, 15, 16, 17], "path_to_checkpoint": 12, "path_to_custom_model": 17, "path_to_pt": 12, "pattern": 1, "pdf": [7, 8, 11], "pdfpage": 7, "peopl": 1, "per": [9, 18], "perform": [4, 7, 8, 9, 10, 13, 17, 18], "period": 1, "permiss": 1, "permut": [4, 8], "persian_lett": 6, "person": [1, 16], "phase": 18, "photo": 16, "physic": [1, 7], "pick": 9, "pictur": 7, "pip": [2, 3, 15, 17], "pipelin": 18, "pixel": [7, 9, 18], "pleas": 2, "plot": 10, "plt": 10, "plug": 14, "plugin": 3, "png": 7, "point": 17, "polici": 13, "polish": 6, "polit": 1, "polygon": [6, 10, 18], "pool": 8, "portugues": 6, "posit": [1, 10], "possibl": [2, 10, 14, 18], "post": [1, 18], "postprocessor": 18, "potenti": 8, "power": 4, "ppageno": 18, "pre": [2, 8, 17], "precis": [10, 18], "pred": 10, "pred_box": 10, "pred_label": 10, "predefin": 16, "predict": [7, 8, 10, 18], "predictor": [4, 7, 8, 11, 12, 14, 17], "prefer": 16, "preinstal": 3, "preprocessor": [12, 18], "prerequisit": 14, "present": 11, "preserv": [8, 9, 18], "preserve_aspect_ratio": [7, 8, 9, 12, 18], "pretrain": [4, 8, 10, 12, 17, 18], "pretrained_backbon": [8, 12], "print": 18, "prior": 6, "privaci": 1, "privat": 1, "probabl": 9, "problem": 2, "procedur": 9, "process": [2, 4, 7, 12, 18], "processor": 18, "produc": [11, 18], "product": 17, "profession": 1, "project": [2, 16], "promptli": 1, "proper": 2, "properli": 6, "provid": [1, 2, 4, 14, 15, 16, 18], "public": [1, 4], "publicli": 18, "publish": 1, "pull": 14, "punctuat": 6, "pure": 6, "purpos": 2, "push_to_hf_hub": [8, 14], "py": 14, "pypdfium2": [3, 7], "pyplot": [7, 10], "python": [2, 15], "python3": 14, "pytorch": [3, 4, 8, 9, 12, 14, 17, 18], "q": 2, "qr": [7, 15], "qr_code": 16, "qualiti": 9, "question": 1, "quickli": 4, "quicktour": 11, "r": 18, "race": 1, "ramdisk": 6, "rand": [8, 9, 10, 17, 18], "random": [8, 9, 10, 18], "randomappli": 9, "randombright": 9, "randomcontrast": 9, "randomcrop": 9, "randomgamma": 9, "randomhorizontalflip": 9, "randomhu": 9, "randomjpegqu": 9, "randomli": 9, "randomres": 9, "randomrot": 9, "randomsatur": 9, "randomshadow": 9, "rang": 9, "rassi": 14, "ratio": [8, 9, 18], "raw": [7, 10], "re": 17, "read": [4, 6, 8], "read_html": 7, "read_img_as_numpi": 7, "read_img_as_tensor": 7, "read_pdf": 7, "readi": 17, "real": [4, 8, 9], "reason": [1, 4, 6], "rebuild": 2, "rebuilt": 2, "recal": [10, 18], "receipt": [4, 6, 18], "reco_arch": [8, 12, 14, 17], "reco_b": 18, "reco_model": [12, 14, 17], "reco_param": 12, "reco_predictor": 12, "recogn": 18, "recognit": [6, 10, 11, 12], "recognition_predictor": [8, 18], "recognition_task": [6, 16], "recognitiondataset": [6, 16], "recognitionpredictor": [8, 12], "rectangular": 8, "reduc": [3, 9], "refer": [2, 3, 12, 14, 15, 16, 18], "regardless": 1, "region": 18, "regroup": 10, "regular": 16, "reject": 1, "rel": [7, 9, 10, 18], "relat": 7, "releas": [0, 3], "relev": 15, "religion": 1, "remov": 1, "render": [7, 18], "repo": 8, "repo_id": [8, 14], "report": 1, "repositori": [6, 8, 14], "repres": [1, 17, 18], "represent": [4, 8], "request": [1, 14], "requir": [3, 9, 17], "research": 4, "residu": 8, "resiz": [9, 18], "resnet": 8, "resnet18": [8, 14], "resnet31": 8, "resnet34": 8, "resnet50": [8, 14], "resolv": 7, "resolve_block": 18, "resolve_lin": 18, "resourc": 16, "respect": 1, "rest": [2, 9, 10], "restrict": 13, "result": [2, 6, 7, 11, 14, 17, 18], "return": 18, "reusabl": 18, "review": 1, "rgb": [7, 9], "rgb_mode": 7, "rgb_output": 7, "right": [1, 8, 10], "robust": [4, 6], "root": 6, "rotat": [6, 7, 8, 9, 10, 11, 12, 16, 18], "run": [2, 3, 8], "same": [2, 7, 10, 16, 17, 18], "sampl": [6, 16, 18], "sample_transform": 6, "sar": [4, 8], "sar_resnet31": [8, 18], "satur": 9, "save": [8, 16], "scale": [7, 8, 9, 10], "scale_rang": 9, "scan": [4, 6], "scene": [4, 6, 8], "score": [7, 10], "script": [2, 16], "seamless": 4, "seamlessli": [4, 18], "search": 8, "searchabl": 11, "sec": 18, "second": 18, "section": [12, 14, 15, 17, 18], "secur": [1, 13], "see": [1, 2], "seen": 18, "segment": [4, 8, 18], "self": 18, "semant": [4, 8], "send": 18, "sens": 10, "sensit": 16, "separ": 18, "sequenc": [4, 6, 7, 8, 10, 18], "sequenti": [9, 18], "seri": 1, "seriou": 1, "set": [1, 3, 6, 8, 10, 13, 15, 18], "set_global_polici": 17, "sever": [7, 9, 18], "sex": 1, "sexual": 1, "shade": 9, "shape": [4, 7, 8, 9, 10, 18], "share": [13, 16], "shift": 9, "shm": 13, "should": [2, 6, 7, 9, 10], "show": [4, 7, 8, 10, 12, 14, 15], "showcas": [2, 11], "shuffl": [6, 9], "side": 10, "signatur": 7, "signific": 16, "simpl": [4, 8, 17], "simpler": 8, "sinc": [6, 16], "singl": [1, 2, 4, 6], "single_img_doc": 17, "size": [1, 6, 7, 9, 15, 18], "skew": 18, "slack": 2, "slightli": 8, "small": [2, 8, 18], "smallest": 7, "snapshot_download": 8, "snippet": 18, "so": [2, 3, 6, 8, 14, 16], "social": 1, "socio": 1, "some": [3, 11, 14, 16], "someth": 2, "somewher": 2, "sort": 1, "sourc": [6, 7, 8, 9, 10, 14], "space": [1, 18], "span": 18, "spanish": 6, "spatial": [4, 6, 7], "specif": [2, 3, 10, 12, 16, 18], "specifi": [1, 6, 7], "speed": [4, 8, 18], "sphinx": 2, "sroie": [4, 6, 16], "stabl": 3, "stackoverflow": 2, "stage": 4, "standalon": 11, "standard": 9, "start": 6, "state": [4, 10, 15], "static": 10, "statu": 1, "std": [9, 12], "step": 13, "still": 18, "str": [6, 7, 8, 9, 10], "straight": [6, 8, 16, 18], "straighten": 18, "straighten_pag": [8, 12, 18], "straigten_pag": 12, "stream": 7, "street": [4, 6], "strict": 3, "strictli": 10, "string": [6, 7, 10, 18], "strive": 3, "strong": [4, 8], "structur": [17, 18], "subset": [6, 18], "suggest": [2, 14], "sum": 10, "summari": 10, "support": [3, 12, 15, 17, 18], "sustain": 1, "svhn": [4, 6, 16], "svt": [6, 16], "swedish": 6, "symmetr": [8, 9, 18], "symmetric_pad": [8, 9, 18], "synthet": 4, "synthtext": [4, 6, 16], "system": 18, "t": [2, 6, 12, 17, 18], "tabl": [14, 15, 16], "take": [1, 6, 18], "target": [6, 7, 9, 10, 16], "target_s": 6, "task": [4, 6, 8, 14, 16, 18], "task2": 6, "team": 3, "techminde": 3, "templat": [2, 4], "tensor": [6, 7, 9, 18], "tensorflow": [3, 4, 7, 8, 9, 12, 14, 17, 18], "tensorspec": 17, "term": 1, "test": [6, 16], "test_set": 6, "text": [6, 7, 8, 10, 16], "text_output": 18, "textmatch": 10, "textnet": 8, "textnet_bas": 8, "textnet_smal": 8, "textnet_tini": 8, "textract": [4, 18], "textstylebrush": [4, 6], "textual": [4, 6, 7, 8, 18], "tf": [3, 7, 8, 9, 14, 17], "than": [2, 10, 14], "thank": 2, "thei": [1, 10], "them": [6, 18], "thi": [1, 2, 3, 5, 6, 9, 10, 12, 13, 14, 16, 17, 18], "thing": [17, 18], "third": 3, "those": [1, 7, 18], "threaten": 1, "threshold": 18, "through": [1, 9, 15, 16], "tilman": 14, "time": [1, 4, 8, 10, 16], "tini": 8, "titl": [7, 18], "tm": 18, "tmp": 13, "togeth": [2, 7], "tograi": 9, "tool": 16, "top": [10, 17, 18], "topic": 2, "torch": [3, 9, 12, 14, 17], "torchvis": 9, "total": 12, "toward": [1, 3], "train": [2, 6, 8, 9, 14, 15, 16, 17, 18], "train_it": [6, 16], "train_load": [6, 16], "train_pytorch": 14, "train_set": [6, 16], "train_tensorflow": 14, "trainabl": [4, 8], "tranform": 9, "transcrib": 18, "transfer": [4, 6], "transfo": 9, "transform": [4, 6, 8], "translat": 1, "troll": 1, "true": [6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18], "truth": 10, "tune": 17, "tupl": [6, 7, 9, 10], "two": [7, 13], "txt": 6, "type": [7, 10, 14, 17, 18], "typic": 18, "u": [1, 2], "ucsd": 6, "udac": 2, "uint8": [7, 8, 10, 18], "ukrainian": 6, "unaccept": 1, "underli": [16, 18], "underneath": 7, "understand": [4, 6, 18], "uniform": [8, 9], "uniformli": 9, "uninterrupt": [7, 18], "union": 10, "unittest": 2, "unlock": 7, "unoffici": 8, "unprofession": 1, "unsolicit": 1, "unsupervis": 4, "unwelcom": 1, "up": [8, 18], "updat": 10, "upgrad": 2, "upper": [6, 9], "uppercas": 16, "url": 7, "us": [1, 2, 3, 6, 8, 10, 11, 12, 13, 14, 15, 18], "usabl": 18, "usag": [13, 17], "use_polygon": [6, 10, 16], "useabl": 18, "user": [4, 7, 11], "utf": 18, "util": 17, "v1": 14, "v3": [8, 14, 18], "valid": 16, "valu": [2, 7, 9, 18], "valuabl": 4, "variabl": 13, "varieti": 6, "veri": 8, "version": [1, 2, 3, 17, 18], "vgg": 8, "vgg16": 14, "vgg16_bn_r": 8, "via": 1, "vietnames": 6, "view": [4, 6], "viewpoint": 1, "violat": 1, "visibl": 1, "vision": [4, 6, 8], "visiondataset": 6, "visiontransform": 8, "visual": [3, 4, 15], "visualize_pag": 10, "vit_": 8, "vit_b": 8, "vitstr": [4, 8, 17], "vitstr_bas": [8, 18], "vitstr_smal": [8, 12, 17, 18], "viz": 3, "vocab": [12, 14, 16, 17, 18], "vocabulari": [6, 12, 14], "w": [7, 8, 9, 10], "w3": 18, "wa": 1, "wai": [1, 4, 16], "want": [2, 17, 18], "warmup": 18, "wasn": 2, "we": [1, 2, 3, 4, 7, 9, 12, 14, 16, 17, 18], "weasyprint": 7, "web": [2, 7], "websit": 6, "welcom": 1, "well": [1, 17], "were": [1, 7, 18], "what": 1, "when": [1, 2, 8], "whenev": 2, "where": [2, 7, 9, 10], "whether": [2, 6, 7, 9, 10, 16, 18], "which": [1, 8, 13, 15, 16, 18], "whichev": 3, "while": [9, 18], "why": 1, "width": [7, 9], "wiki": 1, "wildreceipt": [4, 6, 16], "window": [8, 10], "wish": 2, "within": 1, "without": [1, 6, 8], "wonder": 2, "word": [4, 6, 8, 10, 18], "word_1_1": 18, "word_1_2": 18, "word_1_3": 18, "wordgener": [6, 16], "words_onli": 10, "work": [12, 13, 18], "workflow": 2, "worklow": 2, "world": [10, 18], "worth": 8, "wrap": 18, "wrapper": [6, 9], "write": 13, "written": [1, 7], "www": [1, 7, 18], "x": [7, 9, 10], "x_ascend": 18, "x_descend": 18, "x_i": 10, "x_size": 18, "x_wconf": 18, "xhtml": 18, "xmax": 7, "xmin": 7, "xml": 18, "xml_bytes_str": 18, "xml_element": 18, "xml_output": 18, "xmln": 18, "y": 10, "y_i": 10, "y_j": 10, "yet": 15, "ymax": 7, "ymin": 7, "yolov8": 15, "you": [2, 3, 6, 7, 8, 12, 13, 14, 15, 16, 17, 18], "your": [2, 4, 7, 10, 18], "yoursit": 7, "zero": [9, 10], "zoo": 12, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 6, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 6, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 6, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 6, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 6, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 6, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 6, "\u00e4\u00f6\u00e4\u00f6": 6, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 6, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 6, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 6, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 6, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 6, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 6, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0905": 6, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 6, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 6, "\u0950": 6, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 6, "\u09bd": 6, "\u09ce": 6, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 6}, "titles": ["Changelog", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 2, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 1], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 1], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 1], "31": 0, "4": [0, 1], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 18, "approach": 18, "architectur": 18, "arg": [6, 7, 8, 9, 10], "artefact": 7, "artefactdetect": 15, "attribut": 1, "avail": [15, 16, 18], "aw": 13, "ban": 1, "block": 7, "bug": 2, "changelog": 0, "choos": [16, 18], "classif": [8, 12, 14], "code": [1, 2], "codebas": 2, "commit": 2, "commun": 14, "compos": 9, "conda": 3, "conduct": 1, "connect": 2, "continu": 2, "contrib": 5, "contribut": [2, 5, 15], "contributor": 1, "convent": 14, "correct": 1, "coven": 1, "custom": [6, 12], "data": 16, "dataload": 6, "dataset": [4, 6, 16], "detect": [4, 8, 14, 16, 18], "develop": 2, "do": 18, "doctr": [2, 4, 5, 6, 7, 8, 9, 10, 11], "document": [2, 4, 7], "end": 18, "enforc": 1, "evalu": 10, "export": 17, "factori": 8, "featur": [2, 4], "feedback": 2, "file": 7, "from": 14, "gener": [6, 16], "git": 3, "guidelin": 1, "half": 17, "hub": 14, "huggingfac": 14, "i": 18, "infer": 17, "instal": [2, 3], "integr": [2, 15], "io": 7, "lambda": 13, "let": 2, "line": 7, "linux": 3, "load": [12, 14, 16], "loader": 6, "main": 4, "mode": 2, "model": [4, 8, 12, 14, 17, 18], "modifi": 2, "modul": [5, 15], "name": 14, "notebook": 11, "object": 16, "ocr": [16, 18], "onli": 3, "onnx": 17, "optim": 17, "option": 18, "orient": 12, "our": 1, "output": 18, "own": [12, 16], "packag": 3, "page": 7, "perman": 1, "pipelin": 15, "pledg": 1, "precis": 17, "predictor": 18, "prepar": 17, "prerequisit": 3, "pretrain": 14, "push": 14, "python": 3, "qualiti": 2, "question": 2, "read": 7, "readi": 16, "recognit": [4, 8, 14, 16, 18], "report": 2, "request": 2, "respons": 1, "return": [6, 7, 8, 10], "right": 18, "scope": 1, "share": 14, "should": 18, "stage": 18, "standard": 1, "structur": [2, 7], "style": 2, "support": [4, 5, 6, 9], "synthet": [6, 16], "task": 10, "temporari": 1, "test": 2, "text": [4, 18], "train": 12, "transform": 9, "two": 18, "unit": 2, "us": [16, 17], "util": 10, "v0": 0, "verif": 2, "via": 3, "visual": 10, "vocab": 6, "warn": 1, "what": 18, "word": 7, "your": [12, 14, 15, 16, 17], "zoo": [4, 8]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[1, "correction"]], "2. Warning": [[1, "warning"]], "3. Temporary Ban": [[1, "temporary-ban"]], "4. Permanent Ban": [[1, "permanent-ban"]], "AWS Lambda": [[13, null]], "Advanced options": [[18, "advanced-options"]], "Args:": [[6, "args"], [6, "id4"], [6, "id7"], [6, "id10"], [6, "id13"], [6, "id16"], [6, "id19"], [6, "id22"], [6, "id25"], [6, "id29"], [6, "id32"], [6, "id37"], [6, "id40"], [6, "id46"], [6, "id49"], [6, "id50"], [6, "id51"], [6, "id54"], [6, "id57"], [6, "id60"], [6, "id61"], [7, "args"], [7, "id2"], [7, "id3"], [7, "id4"], [7, "id5"], [7, "id6"], [7, "id7"], [7, "id10"], [7, "id12"], [7, "id14"], [7, "id16"], [7, "id20"], [7, "id24"], [7, "id28"], [8, "args"], [8, "id3"], [8, "id8"], [8, "id13"], [8, "id17"], [8, "id21"], [8, "id26"], [8, "id31"], [8, "id36"], [8, "id41"], [8, "id46"], [8, "id50"], [8, "id54"], [8, "id59"], [8, "id63"], [8, "id68"], [8, "id73"], [8, "id77"], [8, "id81"], [8, "id85"], [8, "id90"], [8, "id95"], [8, "id99"], [8, "id104"], [8, "id109"], [8, "id114"], [8, "id119"], [8, "id123"], [8, "id127"], [8, "id132"], [8, "id137"], [8, "id142"], [8, "id146"], [8, "id150"], [8, "id155"], [8, "id159"], [8, "id163"], [8, "id167"], [8, "id169"], [8, "id171"], [8, "id173"], [9, "args"], [9, "id1"], [9, "id2"], [9, "id3"], [9, "id4"], [9, "id5"], [9, "id6"], [9, "id7"], [9, "id8"], [9, "id9"], [9, "id10"], [9, "id11"], [9, "id12"], [9, "id13"], [9, "id14"], [9, "id15"], [9, "id16"], [9, "id17"], [9, "id18"], [9, "id19"], [10, "args"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"]], "Artefact": [[7, "artefact"]], "ArtefactDetection": [[15, "artefactdetection"]], "Attribution": [[1, "attribution"]], "Available Datasets": [[16, "available-datasets"]], "Available architectures": [[18, "available-architectures"], [18, "id1"], [18, "id2"]], "Available contribution modules": [[15, "available-contribution-modules"]], "Block": [[7, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[16, null]], "Choosing the right model": [[18, null]], "Classification": [[14, "classification"]], "Code quality": [[2, "code-quality"]], "Code style verification": [[2, "code-style-verification"]], "Codebase structure": [[2, "codebase-structure"]], "Commits": [[2, "commits"]], "Composing transformations": [[9, "composing-transformations"]], "Continuous Integration": [[2, "continuous-integration"]], "Contributing to docTR": [[2, null]], "Contributor Covenant Code of Conduct": [[1, null]], "Custom dataset loader": [[6, "custom-dataset-loader"]], "Custom orientation classification models": [[12, "custom-orientation-classification-models"]], "Data Loading": [[16, "data-loading"]], "Dataloader": [[6, "dataloader"]], "Detection": [[14, "detection"], [16, "detection"]], "Detection predictors": [[18, "detection-predictors"]], "Developer mode installation": [[2, "developer-mode-installation"]], "Developing docTR": [[2, "developing-doctr"]], "Document": [[7, "document"]], "Document structure": [[7, "document-structure"]], "End-to-End OCR": [[18, "end-to-end-ocr"]], "Enforcement": [[1, "enforcement"]], "Enforcement Guidelines": [[1, "enforcement-guidelines"]], "Enforcement Responsibilities": [[1, "enforcement-responsibilities"]], "Export to ONNX": [[17, "export-to-onnx"]], "Feature requests & bug report": [[2, "feature-requests-bug-report"]], "Feedback": [[2, "feedback"]], "File reading": [[7, "file-reading"]], "Half-precision": [[17, "half-precision"]], "Installation": [[3, null]], "Integrate contributions into your pipeline": [[15, null]], "Let\u2019s connect": [[2, "let-s-connect"]], "Line": [[7, "line"]], "Loading from Huggingface Hub": [[14, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[12, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[12, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[4, "main-features"]], "Model optimization": [[17, "model-optimization"]], "Model zoo": [[4, "model-zoo"]], "Modifying the documentation": [[2, "modifying-the-documentation"]], "Naming conventions": [[14, "naming-conventions"]], "OCR": [[16, "ocr"]], "Object Detection": [[16, "object-detection"]], "Our Pledge": [[1, "our-pledge"]], "Our Standards": [[1, "our-standards"]], "Page": [[7, "page"]], "Preparing your model for inference": [[17, null]], "Prerequisites": [[3, "prerequisites"]], "Pretrained community models": [[14, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[14, "pushing-to-the-huggingface-hub"]], "Questions": [[2, "questions"]], "Recognition": [[14, "recognition"], [16, "recognition"]], "Recognition predictors": [[18, "recognition-predictors"]], "Returns:": [[6, "returns"], [7, "returns"], [7, "id11"], [7, "id13"], [7, "id15"], [7, "id19"], [7, "id23"], [7, "id27"], [7, "id31"], [8, "returns"], [8, "id6"], [8, "id11"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id29"], [8, "id34"], [8, "id39"], [8, "id44"], [8, "id49"], [8, "id53"], [8, "id57"], [8, "id62"], [8, "id66"], [8, "id71"], [8, "id76"], [8, "id80"], [8, "id84"], [8, "id88"], [8, "id93"], [8, "id98"], [8, "id102"], [8, "id107"], [8, "id112"], [8, "id117"], [8, "id122"], [8, "id126"], [8, "id130"], [8, "id135"], [8, "id140"], [8, "id145"], [8, "id149"], [8, "id153"], [8, "id158"], [8, "id162"], [8, "id166"], [8, "id168"], [8, "id170"], [8, "id172"], [10, "returns"]], "Scope": [[1, "scope"]], "Share your model with the community": [[14, null]], "Supported Vocabs": [[6, "supported-vocabs"]], "Supported contribution modules": [[5, "supported-contribution-modules"]], "Supported datasets": [[4, "supported-datasets"]], "Supported transformations": [[9, "supported-transformations"]], "Synthetic dataset generator": [[6, "synthetic-dataset-generator"], [16, "synthetic-dataset-generator"]], "Task evaluation": [[10, "task-evaluation"]], "Text Detection": [[18, "text-detection"]], "Text Recognition": [[18, "text-recognition"]], "Text detection models": [[4, "text-detection-models"]], "Text recognition models": [[4, "text-recognition-models"]], "Train your own model": [[12, null]], "Two-stage approaches": [[18, "two-stage-approaches"]], "Unit tests": [[2, "unit-tests"]], "Use your own datasets": [[16, "use-your-own-datasets"]], "Using your ONNX exported model": [[17, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[3, "via-conda-only-for-linux"]], "Via Git": [[3, "via-git"]], "Via Python Package": [[3, "via-python-package"]], "Visualization": [[10, "visualization"]], "What should I do with the output?": [[18, "what-should-i-do-with-the-output"]], "Word": [[7, "word"]], "docTR Notebooks": [[11, null]], "docTR Vocabs": [[6, "id62"]], "docTR: Document Text Recognition": [[4, null]], "doctr.contrib": [[5, null]], "doctr.datasets": [[6, null], [6, "datasets"]], "doctr.io": [[7, null]], "doctr.models": [[8, null]], "doctr.models.classification": [[8, "doctr-models-classification"]], "doctr.models.detection": [[8, "doctr-models-detection"]], "doctr.models.factory": [[8, "doctr-models-factory"]], "doctr.models.recognition": [[8, "doctr-models-recognition"]], "doctr.models.zoo": [[8, "doctr-models-zoo"]], "doctr.transforms": [[9, null]], "doctr.utils": [[10, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[7, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[7, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[9, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[6, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[9, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[9, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[6, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[6, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[8, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[6, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[6, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[7, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[7, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[6, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[6, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[9, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[9, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[6, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[6, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[6, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[6, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[6, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[8, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[9, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[7, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[6, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[9, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[8, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[6, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[9, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[7, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[9, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[9, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[9, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[9, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[9, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[9, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[9, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[9, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[9, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[9, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[9, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[9, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[7, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[7, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[7, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[6, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[9, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[7, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[7, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[6, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[6, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[6, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[6, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[9, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[10, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[6, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[7, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[6, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[6, 0, 1, "", "CORD"], [6, 0, 1, "", "CharacterGenerator"], [6, 0, 1, "", "DetectionDataset"], [6, 0, 1, "", "DocArtefacts"], [6, 0, 1, "", "FUNSD"], [6, 0, 1, "", "IC03"], [6, 0, 1, "", "IC13"], [6, 0, 1, "", "IIIT5K"], [6, 0, 1, "", "IIITHWS"], [6, 0, 1, "", "IMGUR5K"], [6, 0, 1, "", "MJSynth"], [6, 0, 1, "", "OCRDataset"], [6, 0, 1, "", "RecognitionDataset"], [6, 0, 1, "", "SROIE"], [6, 0, 1, "", "SVHN"], [6, 0, 1, "", "SVT"], [6, 0, 1, "", "SynthText"], [6, 0, 1, "", "WILDRECEIPT"], [6, 0, 1, "", "WordGenerator"], [6, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[6, 0, 1, "", "DataLoader"]], "doctr.io": [[7, 0, 1, "", "Artefact"], [7, 0, 1, "", "Block"], [7, 0, 1, "", "Document"], [7, 0, 1, "", "DocumentFile"], [7, 0, 1, "", "Line"], [7, 0, 1, "", "Page"], [7, 0, 1, "", "Word"], [7, 1, 1, "", "decode_img_as_tensor"], [7, 1, 1, "", "read_html"], [7, 1, 1, "", "read_img_as_numpy"], [7, 1, 1, "", "read_img_as_tensor"], [7, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[7, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[7, 2, 1, "", "from_images"], [7, 2, 1, "", "from_pdf"], [7, 2, 1, "", "from_url"]], "doctr.io.Page": [[7, 2, 1, "", "show"]], "doctr.models": [[8, 1, 1, "", "kie_predictor"], [8, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[8, 1, 1, "", "crop_orientation_predictor"], [8, 1, 1, "", "magc_resnet31"], [8, 1, 1, "", "mobilenet_v3_large"], [8, 1, 1, "", "mobilenet_v3_large_r"], [8, 1, 1, "", "mobilenet_v3_small"], [8, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [8, 1, 1, "", "mobilenet_v3_small_page_orientation"], [8, 1, 1, "", "mobilenet_v3_small_r"], [8, 1, 1, "", "page_orientation_predictor"], [8, 1, 1, "", "resnet18"], [8, 1, 1, "", "resnet31"], [8, 1, 1, "", "resnet34"], [8, 1, 1, "", "resnet50"], [8, 1, 1, "", "textnet_base"], [8, 1, 1, "", "textnet_small"], [8, 1, 1, "", "textnet_tiny"], [8, 1, 1, "", "vgg16_bn_r"], [8, 1, 1, "", "vit_b"], [8, 1, 1, "", "vit_s"]], "doctr.models.detection": [[8, 1, 1, "", "db_mobilenet_v3_large"], [8, 1, 1, "", "db_resnet50"], [8, 1, 1, "", "detection_predictor"], [8, 1, 1, "", "fast_base"], [8, 1, 1, "", "fast_small"], [8, 1, 1, "", "fast_tiny"], [8, 1, 1, "", "linknet_resnet18"], [8, 1, 1, "", "linknet_resnet34"], [8, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[8, 1, 1, "", "from_hub"], [8, 1, 1, "", "login_to_hub"], [8, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[8, 1, 1, "", "crnn_mobilenet_v3_large"], [8, 1, 1, "", "crnn_mobilenet_v3_small"], [8, 1, 1, "", "crnn_vgg16_bn"], [8, 1, 1, "", "master"], [8, 1, 1, "", "parseq"], [8, 1, 1, "", "recognition_predictor"], [8, 1, 1, "", "sar_resnet31"], [8, 1, 1, "", "vitstr_base"], [8, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[9, 0, 1, "", "ChannelShuffle"], [9, 0, 1, "", "ColorInversion"], [9, 0, 1, "", "Compose"], [9, 0, 1, "", "GaussianBlur"], [9, 0, 1, "", "GaussianNoise"], [9, 0, 1, "", "LambdaTransformation"], [9, 0, 1, "", "Normalize"], [9, 0, 1, "", "OneOf"], [9, 0, 1, "", "RandomApply"], [9, 0, 1, "", "RandomBrightness"], [9, 0, 1, "", "RandomContrast"], [9, 0, 1, "", "RandomCrop"], [9, 0, 1, "", "RandomGamma"], [9, 0, 1, "", "RandomHorizontalFlip"], [9, 0, 1, "", "RandomHue"], [9, 0, 1, "", "RandomJpegQuality"], [9, 0, 1, "", "RandomResize"], [9, 0, 1, "", "RandomRotate"], [9, 0, 1, "", "RandomSaturation"], [9, 0, 1, "", "RandomShadow"], [9, 0, 1, "", "Resize"], [9, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[10, 0, 1, "", "DetectionMetric"], [10, 0, 1, "", "LocalizationConfusion"], [10, 0, 1, "", "OCRMetric"], [10, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.visualization": [[10, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [1, 7, 8, 10, 14, 17], "0": [1, 3, 6, 9, 10, 12, 15, 16, 18], "00": 18, "01": 18, "0123456789": 6, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "02562": 8, "03": 18, "035": 18, "0361328125": 18, "04": 18, "05": 18, "06": 18, "06640625": 18, "07": 18, "08": [9, 18], "09": 18, "0966796875": 18, "1": [6, 7, 8, 9, 10, 12, 16, 18], "10": [6, 10, 18], "100": [6, 9, 10, 16, 18], "1000": 18, "101": 6, "1024": [8, 12, 18], "104": 6, "106": 6, "108": 6, "1095": 16, "11": 18, "110": 10, "1107": 16, "114": 6, "115": 6, "1156": 16, "116": 6, "118": 6, "11800h": 18, "11th": 18, "12": 18, "120": 6, "123": 6, "126": 6, "1268": 16, "128": [8, 12, 17, 18], "13": 18, "130": 6, "13068": 16, "131": 6, "1337891": 16, "1357421875": 18, "1396484375": 18, "14": 18, "1420": 18, "14470v1": 6, "149": 16, "15": 18, "150": [10, 18], "1552": 18, "16": [8, 17, 18], "1630859375": 18, "1684": 18, "16x16": 8, "17": 18, "1778": 18, "1782": 18, "18": [8, 18], "185546875": 18, "1900": 18, "1910": 8, "19342": 16, "19370": 16, "195": 6, "19598": 16, "199": 18, "1999": 18, "2": [3, 4, 6, 7, 9, 15, 18], "20": 18, "200": 10, "2000": 16, "2003": [4, 6], "2012": 6, "2013": [4, 6], "2015": 6, "2019": 4, "207901": 16, "21": 18, "2103": 6, "2186": 16, "21888": 16, "22": 18, "224": [8, 9], "225": 9, "22672": 16, "229": [9, 16], "23": 18, "233": 16, "236": 6, "24": 18, "246": 16, "249": 16, "25": 18, "2504": 18, "255": [7, 8, 9, 10, 18], "256": 8, "257": 16, "26": 18, "26032": 16, "264": 12, "27": 18, "2700": 16, "2710": 18, "2749": 12, "28": 18, "287": 12, "29": 18, "296": 12, "299": 12, "2d": 18, "3": [3, 4, 7, 8, 9, 10, 17, 18], "30": 18, "300": 16, "3000": 16, "301": 12, "30595": 18, "30ghz": 18, "31": 8, "32": [6, 8, 9, 12, 16, 17, 18], "3232421875": 18, "33": [9, 18], "33402": 16, "33608": 16, "34": [8, 18], "340": 18, "3456": 18, "3515625": 18, "36": 18, "360": 16, "37": [6, 18], "38": 18, "39": 18, "4": [8, 9, 10, 18], "40": 18, "406": 9, "41": 18, "42": 18, "43": 18, "44": 18, "45": 18, "456": 9, "46": 18, "47": 18, "472": 16, "48": [6, 18], "485": 9, "49": 18, "49377": 16, "5": [6, 9, 10, 15, 18], "50": [8, 16, 18], "51": 18, "51171875": 18, "512": 8, "52": [6, 18], "529": 18, "53": 18, "54": 18, "540": 18, "5478515625": 18, "55": 18, "56": 18, "57": 18, "58": [6, 18], "580": 18, "5810546875": 18, "583": 18, "59": 18, "597": 18, "5k": [4, 6], "5m": 18, "6": [9, 18], "60": 9, "600": [8, 10, 18], "61": 18, "62": 18, "626": 16, "63": 18, "64": [8, 9, 18], "641": 18, "647": 16, "65": 18, "66": 18, "67": 18, "68": 18, "69": 18, "693": 12, "694": 12, "695": 12, "6m": 18, "7": 18, "70": [6, 10, 18], "707470": 16, "71": [6, 18], "7100000": 16, "7141797": 16, "7149": 16, "72": 18, "72dpi": 7, "73": 18, "73257": 16, "74": 18, "75": [9, 18], "7581382": 16, "76": 18, "77": 18, "772": 12, "772875": 16, "78": 18, "785": 12, "79": 18, "793533": 16, "796": 16, "798": 12, "7m": 18, "8": [8, 9, 18], "80": 18, "800": [8, 10, 16, 18], "81": 18, "82": 18, "83": 18, "84": 18, "849": 16, "85": 18, "8564453125": 18, "857": 18, "85875": 16, "86": 18, "8603515625": 18, "87": 18, "8707": 16, "88": 18, "89": 18, "9": [3, 9, 18], "90": 18, "90k": 6, "90kdict32px": 6, "91": 18, "914085328578949": 18, "92": 18, "93": 18, "94": [6, 18], "95": [10, 18], "9578408598899841": 18, "96": 18, "97": 18, "98": 18, "99": 18, "9949972033500671": 18, "A": [1, 2, 4, 6, 7, 8, 11, 17], "As": 2, "Be": 18, "Being": 1, "By": 13, "For": [1, 2, 3, 12, 18], "If": [2, 7, 8, 12, 18], "In": [2, 6, 16], "It": [9, 14, 15, 17], "Its": [4, 8], "No": [1, 18], "Of": 6, "Or": [15, 17], "The": [1, 2, 6, 7, 10, 13, 15, 16, 17, 18], "Then": 8, "To": [2, 3, 13, 14, 15, 17, 18], "_": [1, 6, 8], "__call__": 18, "_build": 2, "_i": 10, "ab": 6, "abc": 17, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "abdef": [6, 16], "abl": [16, 18], "about": [1, 16, 18], "abov": 18, "abstractdataset": 6, "abus": 1, "accept": 1, "access": [4, 7, 16, 18], "account": [1, 14], "accur": 18, "accuraci": 10, "achiev": 17, "act": 1, "action": 1, "activ": 4, "ad": [2, 8, 9], "adapt": 1, "add": [9, 10, 14, 18], "add_hook": 18, "add_label": 10, "addit": [2, 3, 7, 15, 18], "addition": [2, 18], "address": [1, 7], "adjust": 9, "advanc": 1, "advantag": 17, "advis": 2, "aesthet": [4, 6], "affect": 1, "after": [14, 18], "ag": 1, "again": 8, "aggreg": [10, 16], "aggress": 1, "align": [1, 7, 9], "all": [1, 2, 5, 6, 7, 9, 10, 15, 16, 18], "allow": [1, 17], "along": 18, "alreadi": [2, 17], "also": [1, 8, 14, 15, 16, 18], "alwai": 16, "an": [1, 2, 4, 6, 7, 8, 10, 15, 17, 18], "analysi": [7, 15], "ancient_greek": 6, "angl": [7, 9], "ani": [1, 6, 7, 8, 9, 10, 17, 18], "annot": 6, "anot": 16, "anoth": [8, 12, 16], "answer": 1, "anyascii": 10, "anyon": 4, "anyth": 15, "api": [2, 4], "apolog": 1, "apologi": 1, "app": 2, "appear": 1, "appli": [1, 6, 9], "applic": [4, 8], "appoint": 1, "appreci": 14, "appropri": [1, 2, 18], "ar": [1, 2, 3, 5, 6, 7, 9, 10, 11, 15, 16, 18], "arab": 6, "arabic_diacrit": 6, "arabic_lett": 6, "arabic_punctu": 6, "arbitrarili": [4, 8], "arch": [8, 14], "architectur": [4, 8, 14, 15], "area": 18, "argument": [6, 7, 8, 10, 12, 18], "around": 1, "arrai": [7, 9, 10], "art": [4, 15], "artefact": [10, 15, 18], "artefact_typ": 7, "artifici": [4, 6], "arxiv": [6, 8], "asarrai": 10, "ascii_lett": 6, "aspect": [4, 8, 9, 18], "assess": 10, "assign": 10, "associ": 7, "assum": 8, "assume_straight_pag": [8, 12, 18], "astyp": [8, 10, 18], "attack": 1, "attend": [4, 8], "attent": [1, 8], "autom": 4, "automat": 18, "autoregress": [4, 8], "avail": [1, 4, 5, 9], "averag": [9, 18], "avoid": [1, 3], "aw": [4, 18], "awar": 18, "azur": 18, "b": [8, 10, 18], "b_j": 10, "back": 2, "backbon": 8, "backend": 18, "background": 16, "bangla": 6, "bar": 15, "bar_cod": 16, "base": [4, 8, 15], "baselin": [4, 8, 18], "batch": [6, 8, 9, 15, 16, 18], "batch_siz": [6, 12, 15, 16, 17], "bblanchon": 3, "bbox": 18, "becaus": 13, "been": [2, 10, 16, 18], "befor": [6, 8, 9, 18], "begin": 10, "behavior": [1, 18], "being": [10, 18], "belong": 18, "benchmark": 18, "best": 1, "better": [11, 18], "between": [9, 10, 18], "bgr": 7, "bilinear": 9, "bin_thresh": 18, "binar": [4, 8, 18], "binari": [7, 17, 18], "bit": 17, "block": [10, 18], "block_1_1": 18, "blur": 9, "bmvc": 6, "bn": 14, "bodi": [1, 18], "bool": [6, 7, 8, 9, 10], "boolean": [8, 18], "both": [4, 6, 9, 16, 18], "bottom": [8, 18], "bound": [6, 7, 8, 9, 10, 15, 16, 18], "box": [6, 7, 8, 9, 10, 15, 16, 18], "box_thresh": 18, "bright": 9, "browser": [2, 4], "build": [2, 3, 17], "built": 2, "byte": [7, 18], "c": [3, 7, 10], "c_j": 10, "cach": [2, 6, 13], "cache_sampl": 6, "call": 17, "callabl": [6, 9], "can": [2, 3, 12, 13, 14, 15, 16, 18], "capabl": [2, 11, 18], "case": [6, 10], "cf": 18, "cfg": 18, "challeng": 6, "challenge2_test_task12_imag": 6, "challenge2_test_task1_gt": 6, "challenge2_training_task12_imag": 6, "challenge2_training_task1_gt": 6, "chang": [13, 18], "channel": [1, 2, 7, 9], "channel_prior": 3, "channelshuffl": 9, "charact": [4, 6, 7, 10, 16, 18], "charactergener": [6, 16], "characterist": 1, "charg": 18, "charset": 18, "chart": 7, "check": [2, 14, 18], "checkpoint": 8, "chip": 3, "ci": 2, "clarifi": 1, "clariti": 1, "class": [1, 6, 7, 9, 10, 18], "class_nam": 12, "classif": [16, 18], "classmethod": 7, "clear": 2, "clone": 3, "close": 2, "co": 14, "code": [4, 7, 15], "codecov": 2, "colab": 11, "collate_fn": 6, "collect": [7, 15], "color": 9, "colorinvers": 9, "column": 7, "com": [1, 3, 7, 8, 14], "combin": 18, "command": [2, 15], "comment": 1, "commit": 1, "common": [1, 9, 10, 17], "commun": 1, "compar": 4, "comparison": [10, 18], "competit": 6, "compil": [11, 18], "complaint": 1, "complementari": 10, "complet": 2, "compon": 18, "compos": [6, 18], "comprehens": 18, "comput": [6, 10, 17, 18], "conf_threshold": 15, "confid": [7, 18], "config": [3, 8], "configur": 8, "confus": 10, "consecut": [9, 18], "consequ": 1, "consid": [1, 2, 6, 7, 10, 18], "consist": 18, "consolid": [4, 6], "constant": 9, "construct": 1, "contact": 1, "contain": [5, 6, 11, 16, 18], "content": [6, 7, 18], "context": 8, "contib": 3, "continu": 1, "contrast": 9, "contrast_factor": 9, "contrib": [3, 15], "contribut": 1, "contributor": 2, "convers": 7, "convert": [7, 9], "convolut": 8, "coordin": [7, 18], "cord": [4, 6, 16, 18], "core": [10, 18], "corner": 18, "correct": 9, "correspond": [3, 7, 9, 18], "could": [1, 15], "counterpart": 10, "cover": 2, "coverag": 2, "cpu": [4, 12, 17], "creat": 14, "crnn": [4, 8, 14], "crnn_mobilenet_v3_larg": [8, 14, 18], "crnn_mobilenet_v3_smal": [8, 17, 18], "crnn_vgg16_bn": [8, 12, 14, 18], "crop": [7, 8, 9, 12, 16, 18], "crop_orient": [7, 18], "crop_orientation_predictor": [8, 12], "crop_param": 12, "cuda": 17, "currenc": 6, "current": [2, 12, 18], "custom": [14, 15, 17, 18], "custom_crop_orientation_model": 12, "custom_page_orientation_model": 12, "customhook": 18, "cvit": 4, "czczup": 8, "czech": 6, "d": [6, 16], "danish": 6, "data": [4, 6, 7, 9, 10, 12, 14], "dataload": 16, "dataset": [8, 12, 18], "dataset_info": 6, "date": [12, 18], "db": 14, "db_mobilenet_v3_larg": [8, 14, 18], "db_resnet34": 18, "db_resnet50": [8, 12, 14, 18], "dbnet": [4, 8], "deal": [11, 18], "decis": 1, "decod": 7, "decode_img_as_tensor": 7, "dedic": 17, "deem": 1, "deep": [8, 18], "def": 18, "default": [3, 7, 12, 13, 18], "defer": 16, "defin": [10, 17], "degre": [7, 9, 18], "degress": 7, "delet": 2, "delimit": 18, "delta": 9, "demo": [2, 4], "demonstr": 1, "depend": [2, 3, 4, 18], "deploi": 2, "deploy": 4, "derogatori": 1, "describ": 8, "descript": 11, "design": 9, "desir": 7, "det_arch": [8, 12, 14, 17], "det_b": 18, "det_model": [12, 14, 17], "det_param": 12, "det_predictor": [12, 18], "detail": [12, 18], "detect": [6, 7, 10, 11, 12, 15], "detect_languag": 8, "detect_orient": [8, 12, 18], "detection_predictor": [8, 18], "detection_task": [6, 16], "detectiondataset": [6, 16], "detectionmetr": 10, "detectionpredictor": [8, 12], "detector": [4, 8, 15], "deterior": 8, "determin": 1, "dev": [2, 13], "develop": 3, "deviat": 9, "devic": 17, "dict": [7, 10, 18], "dictionari": [7, 10], "differ": 1, "differenti": [4, 8], "digit": [4, 6, 16], "dimens": [7, 10, 18], "dimension": 9, "direct": 6, "directli": [14, 18], "directori": [2, 13], "disabl": [1, 13, 18], "disable_crop_orient": 18, "disable_page_orient": 18, "disclaim": 18, "discuss": 2, "disparag": 1, "displai": [7, 10], "display_artefact": 10, "distribut": 9, "div": 18, "divers": 1, "divid": 7, "do": [2, 3, 8], "doc": [2, 7, 15, 17, 18], "docartefact": [6, 16], "docstr": 2, "doctr": [3, 12, 13, 14, 15, 16, 17, 18], "doctr_cache_dir": 13, "doctr_multiprocessing_dis": 13, "document": [6, 8, 10, 11, 12, 15, 16, 17, 18], "documentbuild": 18, "documentfil": [7, 12, 14, 15, 17], "doesn": 17, "don": [12, 18], "done": 9, "download": [6, 16], "downsiz": 8, "draw": 9, "drop": 6, "drop_last": 6, "dtype": [7, 8, 9, 10, 17], "dual": [4, 6], "dummi": 14, "dummy_img": 18, "dummy_input": 17, "dure": 1, "dutch": 6, "dynam": [6, 15], "dynamic_seq_length": 6, "e": [1, 2, 3, 7, 8], "each": [4, 6, 7, 8, 9, 10, 16, 18], "eas": 2, "easi": [4, 10, 14, 17], "easili": [7, 10, 12, 14, 16, 18], "econom": 1, "edit": 1, "educ": 1, "effect": 18, "effici": [2, 4, 6, 8], "either": [10, 18], "element": [6, 7, 8, 18], "els": [2, 15], "email": 1, "empathi": 1, "en": 18, "enabl": [6, 7], "enclos": 7, "encod": [4, 6, 7, 8, 18], "encode_sequ": 6, "encount": 2, "encrypt": 7, "end": [4, 6, 8, 10], "english": [6, 16], "enough": [2, 18], "ensur": 2, "entri": 6, "environ": [1, 13], "eo": 6, "equiv": 18, "estim": 8, "etc": [7, 15], "ethnic": 1, "evalu": [16, 18], "event": 1, "everyon": 1, "everyth": [2, 18], "exact": [10, 18], "exampl": [1, 2, 4, 6, 8, 14, 18], "exchang": 17, "execut": 18, "exist": 14, "expand": 9, "expect": [7, 9, 10], "experi": 1, "explan": [1, 18], "explicit": 1, "exploit": [4, 8], "export": [7, 8, 10, 11, 15, 18], "export_as_straight_box": [8, 18], "export_as_xml": 18, "export_model_to_onnx": 17, "express": [1, 9], "extens": 7, "extern": [1, 16], "extract": [4, 6], "extractor": 8, "f_": 10, "f_a": 10, "factor": 9, "fair": 1, "fairli": 1, "fals": [6, 7, 8, 9, 10, 12, 18], "faq": 1, "fascan": 14, "fast": [4, 6, 8], "fast_bas": [8, 18], "fast_smal": [8, 18], "fast_tini": [8, 18], "faster": [4, 8, 17], "fasterrcnn_mobilenet_v3_large_fpn": 8, "favorit": 18, "featur": [3, 8, 10, 11, 12, 15], "feedback": 1, "feel": [2, 14], "felix92": 14, "few": [17, 18], "figsiz": 10, "figur": [10, 15], "file": [2, 6], "final": 8, "find": [2, 16], "finnish": 6, "first": [2, 6], "firsthand": 6, "fit": [8, 18], "flag": 18, "flip": 9, "float": [7, 9, 10, 17], "float32": [7, 8, 9, 17], "fn": 9, "focu": 14, "focus": [1, 6], "folder": 6, "follow": [1, 2, 3, 6, 9, 10, 12, 13, 14, 15, 18], "font": 6, "font_famili": 6, "foral": 10, "forc": 2, "forg": 3, "form": [4, 6, 18], "format": [7, 10, 12, 16, 17, 18], "forpost": [4, 6], "forum": 2, "fp16": 17, "frac": 10, "framework": [3, 14, 16, 18], "free": [1, 2, 14], "french": [6, 12, 14, 18], "friendli": 4, "from": [1, 4, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18], "from_hub": [8, 14], "from_imag": [7, 14, 15, 17], "from_pdf": 7, "from_url": 7, "full": [6, 10, 18], "function": [6, 9, 10, 15], "funsd": [4, 6, 16, 18], "further": 16, "futur": 6, "g": [7, 8], "g_": 10, "g_x": 10, "gamma": 9, "gaussian": 9, "gaussianblur": 9, "gaussiannois": 9, "gen": 18, "gender": 1, "gener": [2, 4, 7, 8], "generic_cyrillic_lett": 6, "geometri": [4, 7, 18], "geq": 10, "german": [6, 12, 14], "get": [17, 18], "git": 14, "github": [2, 3, 8, 14], "give": [1, 15], "given": [6, 7, 9, 10, 18], "global": 8, "go": 18, "good": 17, "googl": 2, "googlevis": 4, "gpu": [4, 15, 17], "gracefulli": 1, "graph": [4, 6, 7], "grayscal": 9, "ground": 10, "groung": 10, "group": [4, 18], "gt": 10, "gt_box": 10, "gt_label": 10, "guid": 2, "guidanc": 16, "gvision": 18, "h": [7, 8, 9], "h_": 10, "ha": [2, 6, 10, 16], "handl": [11, 16, 18], "handwrit": 6, "handwritten": 16, "harass": 1, "hardwar": 18, "harm": 1, "hat": 10, "have": [1, 2, 10, 12, 14, 16, 17, 18], "head": [8, 18], "healthi": 1, "hebrew": 6, "height": [7, 9], "hello": [10, 18], "help": 17, "here": [5, 9, 11, 15, 16, 18], "hf": 8, "hf_hub_download": 8, "high": 7, "higher": [3, 6, 18], "hindi": 6, "hindi_digit": 6, "hocr": 18, "hook": 18, "horizont": [7, 9, 18], "hous": 6, "how": [2, 11, 12, 14, 16], "howev": 16, "hsv": 9, "html": [1, 2, 3, 7, 18], "http": [1, 3, 6, 7, 8, 14, 18], "hub": 8, "hue": 9, "huggingfac": 8, "hw": 6, "i": [1, 2, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17], "i7": 18, "ic03": [4, 6, 16], "ic13": [4, 6, 16], "icdar": [4, 6], "icdar2019": 6, "id": 18, "ident": 1, "identifi": 4, "iiit": [4, 6], "iiit5k": [6, 16], "iiithw": [4, 6, 16], "imag": [4, 6, 7, 8, 9, 10, 14, 15, 16, 18], "imagenet": 8, "imageri": 1, "images_90k_norm": 6, "img": [6, 9, 16, 17], "img_cont": 7, "img_fold": [6, 16], "img_path": 7, "img_transform": 6, "imgur5k": [4, 6, 16], "imgur5k_annot": 6, "imlist": 6, "impact": 1, "implement": [6, 7, 8, 9, 10, 18], "import": [6, 7, 8, 9, 10, 12, 14, 15, 16, 17, 18], "improv": 8, "inappropri": 1, "incid": 1, "includ": [1, 6, 16, 17], "inclus": 1, "increas": 9, "independ": 9, "index": [2, 7], "indic": 10, "individu": 1, "infer": [4, 8, 9, 15, 18], "inform": [1, 2, 4, 6, 16], "input": [2, 7, 8, 9, 17, 18], "input_crop": 8, "input_pag": [8, 10, 18], "input_shap": 17, "input_tensor": 8, "inspir": [1, 9], "instal": [14, 15, 17], "instanc": [1, 18], "instanti": [8, 18], "instead": [6, 7, 8], "insult": 1, "int": [6, 7, 9], "int64": 10, "integ": 10, "integr": [4, 14, 16], "intel": 18, "interact": [1, 7, 10], "interfac": [14, 17], "interoper": 17, "interpol": 9, "interpret": [6, 7], "intersect": 10, "invert": 9, "investig": 1, "invis": 1, "involv": [1, 18], "io": [12, 14, 15, 17], "iou": 10, "iou_thresh": 10, "iou_threshold": 15, "irregular": [4, 8, 16], "isn": 6, "issu": [1, 2, 14], "italian": 6, "iter": [6, 9, 16, 18], "its": [7, 8, 9, 10, 16, 18], "itself": [8, 14], "j": 10, "job": 2, "join": 2, "jpeg": 9, "jpegqual": 9, "jpg": [6, 7, 14, 17], "json": [6, 16, 18], "json_output": 18, "jump": 2, "just": 1, "kei": [4, 6], "kera": [8, 17], "kernel": [4, 8, 9], "kernel_shap": 9, "keywoard": 8, "keyword": [6, 7, 8, 10], "kie": [8, 12], "kie_predictor": [8, 12], "kiepredictor": 8, "kind": 1, "know": [2, 17], "kwarg": [6, 7, 8, 10], "l": 10, "l_j": 10, "label": [6, 10, 15, 16], "label_fil": [6, 16], "label_fold": 6, "label_path": [6, 16], "labels_path": [6, 16], "ladder": 1, "lambda": 9, "lambdatransform": 9, "lang": 18, "languag": [1, 4, 6, 7, 8, 14, 18], "larg": [8, 14], "largest": 10, "last": [3, 6], "latenc": 8, "later": 2, "latest": 18, "latin": 6, "layer": 17, "layout": 18, "lead": 1, "leader": 1, "learn": [1, 4, 8, 17, 18], "least": 3, "left": [10, 18], "legacy_french": 6, "length": [6, 18], "less": [17, 18], "level": [1, 6, 10, 18], "leverag": 11, "lf": 14, "librari": [2, 3, 11, 12], "light": 4, "lightweight": 17, "like": 1, "limits_": 10, "line": [4, 8, 10, 18], "line_1_1": 18, "link": 12, "linknet": [4, 8], "linknet_resnet18": [8, 12, 17, 18], "linknet_resnet34": [8, 17, 18], "linknet_resnet50": [8, 18], "list": [6, 7, 9, 10, 14], "ll": 10, "load": [4, 6, 8, 15, 17], "load_state_dict": 12, "load_weight": 12, "loc_pr": 18, "local": [2, 4, 6, 8, 10, 16, 18], "localis": 6, "localizationconfus": 10, "locat": [2, 7, 18], "login": 8, "login_to_hub": [8, 14], "logo": [7, 15, 16], "love": 14, "lower": [9, 10, 18], "m": [2, 10, 18], "m1": 3, "macbook": 3, "machin": 17, "made": 4, "magc_resnet31": 8, "mai": [1, 2], "mail": 1, "main": 11, "maintain": 4, "mainten": 2, "make": [1, 2, 10, 12, 13, 14, 17, 18], "mani": [16, 18], "manipul": 18, "map": [6, 8], "map_loc": 12, "master": [4, 8, 18], "match": [10, 18], "mathcal": 10, "matplotlib": [7, 10], "max": [6, 9, 10], "max_angl": 9, "max_area": 9, "max_char": [6, 16], "max_delta": 9, "max_gain": 9, "max_gamma": 9, "max_qual": 9, "max_ratio": 9, "maximum": [6, 9], "maxval": [8, 9], "mbox": 10, "mean": [9, 10, 12], "meaniou": 10, "meant": [7, 17], "measur": 18, "media": 1, "median": 8, "meet": 12, "member": 1, "memori": [13, 17], "mention": 18, "merg": 6, "messag": 2, "meta": 18, "metadata": 17, "metal": 3, "method": [7, 9, 18], "metric": [10, 18], "middl": 18, "might": [17, 18], "min": 9, "min_area": 9, "min_char": [6, 16], "min_gain": 9, "min_gamma": 9, "min_qual": 9, "min_ratio": 9, "min_val": 9, "minde": [1, 3, 4, 8], "minim": [2, 4], "minimalist": [4, 8], "minimum": [3, 6, 9, 10, 18], "minval": 9, "miss": 3, "mistak": 1, "mixed_float16": 17, "mixed_precis": 17, "mjsynth": [4, 6, 16], "mnt": 6, "mobilenet": [8, 14], "mobilenet_v3_larg": 8, "mobilenet_v3_large_r": 8, "mobilenet_v3_smal": [8, 12], "mobilenet_v3_small_crop_orient": [8, 12], "mobilenet_v3_small_page_orient": [8, 12], "mobilenet_v3_small_r": 8, "mobilenetv3": 8, "modal": [4, 6], "mode": 3, "model": [6, 10, 13, 15, 16], "model_nam": [8, 14, 17], "model_path": [15, 17], "moder": 1, "modif": 2, "modifi": [8, 13, 18], "modul": [3, 7, 8, 9, 10, 18], "more": [2, 16, 18], "most": 18, "mozilla": 1, "multi": [4, 8], "multilingu": [6, 14], "multipl": [6, 7, 9, 18], "multipli": 9, "multiprocess": 13, "my": 8, "my_awesome_model": 14, "my_hook": 18, "n": [6, 10], "name": [6, 8, 17, 18], "nation": 1, "natur": [1, 4, 6], "ndarrai": [6, 7, 9, 10], "necessari": [3, 12, 13], "need": [2, 3, 6, 10, 12, 13, 14, 15, 18], "neg": 9, "nest": 18, "network": [4, 6, 8, 17], "neural": [4, 6, 8, 17], "new": [2, 10], "next": [6, 16], "nois": 9, "noisi": [4, 6], "non": [4, 6, 7, 8, 9, 10], "none": [6, 7, 8, 9, 10, 18], "normal": [8, 9], "norwegian": 6, "note": [0, 2, 6, 8, 12, 14, 15, 17], "now": 2, "np": [8, 9, 10, 18], "num_output_channel": 9, "num_sampl": [6, 16], "number": [6, 9, 10, 18], "numpi": [7, 8, 10, 18], "o": 3, "obb": 15, "obj_detect": 14, "object": [6, 7, 10, 15, 18], "objectness_scor": [7, 18], "oblig": 1, "obtain": 18, "occupi": 17, "ocr": [4, 6, 8, 10, 14], "ocr_carea": 18, "ocr_db_crnn": 10, "ocr_lin": 18, "ocr_pag": 18, "ocr_par": 18, "ocr_predictor": [8, 12, 14, 17, 18], "ocrdataset": [6, 16], "ocrmetr": 10, "ocrpredictor": [8, 12], "ocrx_word": 18, "offens": 1, "offici": [1, 8], "offlin": 1, "offset": 9, "onc": 18, "one": [2, 6, 8, 9, 12, 14, 18], "oneof": 9, "ones": [6, 10], "onli": [2, 8, 9, 10, 12, 14, 16, 17, 18], "onlin": 1, "onnx": 15, "onnxruntim": [15, 17], "onnxtr": 17, "opac": 9, "opacity_rang": 9, "open": [1, 2, 14, 17], "opinion": 1, "optic": [4, 18], "optim": [4, 18], "option": [6, 8, 12], "order": [2, 6, 7, 9], "org": [1, 6, 8, 18], "organ": 7, "orient": [1, 7, 8, 11, 15, 18], "orientationpredictor": 8, "other": [1, 2], "otherwis": [1, 7, 10], "our": [2, 8, 18], "out": [2, 8, 9, 10, 18], "outpout": 18, "output": [7, 9, 17], "output_s": [7, 9], "outsid": 13, "over": [6, 10, 18], "overal": [1, 8], "overlai": 7, "overview": 15, "overwrit": 12, "overwritten": 14, "own": 4, "p": [9, 18], "packag": [2, 4, 10, 13, 15, 16, 17], "pad": [6, 8, 9, 18], "page": [3, 6, 8, 10, 12, 18], "page1": 7, "page2": 7, "page_1": 18, "page_idx": [7, 18], "page_orientation_predictor": [8, 12], "page_param": 12, "pair": 10, "paper": 8, "par_1_1": 18, "paragraph": 18, "paragraph_break": 18, "param": [9, 18], "paramet": [4, 7, 8, 17], "pars": [4, 6], "parseq": [4, 8, 14, 17, 18], "part": [6, 9, 18], "parti": 3, "partial": 18, "particip": 1, "pass": [6, 7, 8, 12, 18], "password": 7, "patch": [8, 10], "path": [6, 7, 15, 16, 17], "path_to_checkpoint": 12, "path_to_custom_model": 17, "path_to_pt": 12, "pattern": 1, "pdf": [7, 8, 11], "pdfpage": 7, "peopl": 1, "per": [9, 18], "perform": [4, 7, 8, 9, 10, 13, 17, 18], "period": 1, "permiss": 1, "permut": [4, 8], "persian_lett": 6, "person": [1, 16], "phase": 18, "photo": 16, "physic": [1, 7], "pick": 9, "pictur": 7, "pip": [2, 3, 15, 17], "pipelin": 18, "pixel": [7, 9, 18], "pleas": 2, "plot": 10, "plt": 10, "plug": 14, "plugin": 3, "png": 7, "point": 17, "polici": 13, "polish": 6, "polit": 1, "polygon": [6, 10, 18], "pool": 8, "portugues": 6, "posit": [1, 10], "possibl": [2, 10, 14, 18], "post": [1, 18], "postprocessor": 18, "potenti": 8, "power": 4, "ppageno": 18, "pre": [2, 8, 17], "precis": [10, 18], "pred": 10, "pred_box": 10, "pred_label": 10, "predefin": 16, "predict": [7, 8, 10, 18], "predictor": [4, 7, 8, 11, 12, 14, 17], "prefer": 16, "preinstal": 3, "preprocessor": [12, 18], "prerequisit": 14, "present": 11, "preserv": [8, 9, 18], "preserve_aspect_ratio": [7, 8, 9, 12, 18], "pretrain": [4, 8, 10, 12, 17, 18], "pretrained_backbon": [8, 12], "print": 18, "prior": 6, "privaci": 1, "privat": 1, "probabl": 9, "problem": 2, "procedur": 9, "process": [2, 4, 7, 12, 18], "processor": 18, "produc": [11, 18], "product": 17, "profession": 1, "project": [2, 16], "promptli": 1, "proper": 2, "properli": 6, "provid": [1, 2, 4, 14, 15, 16, 18], "public": [1, 4], "publicli": 18, "publish": 1, "pull": 14, "punctuat": 6, "pure": 6, "purpos": 2, "push_to_hf_hub": [8, 14], "py": 14, "pypdfium2": [3, 7], "pyplot": [7, 10], "python": [2, 15], "python3": 14, "pytorch": [3, 4, 8, 9, 12, 14, 17, 18], "q": 2, "qr": [7, 15], "qr_code": 16, "qualiti": 9, "question": 1, "quickli": 4, "quicktour": 11, "r": 18, "race": 1, "ramdisk": 6, "rand": [8, 9, 10, 17, 18], "random": [8, 9, 10, 18], "randomappli": 9, "randombright": 9, "randomcontrast": 9, "randomcrop": 9, "randomgamma": 9, "randomhorizontalflip": 9, "randomhu": 9, "randomjpegqu": 9, "randomli": 9, "randomres": 9, "randomrot": 9, "randomsatur": 9, "randomshadow": 9, "rang": 9, "rassi": 14, "ratio": [8, 9, 18], "raw": [7, 10], "re": 17, "read": [4, 6, 8], "read_html": 7, "read_img_as_numpi": 7, "read_img_as_tensor": 7, "read_pdf": 7, "readi": 17, "real": [4, 8, 9], "reason": [1, 4, 6], "rebuild": 2, "rebuilt": 2, "recal": [10, 18], "receipt": [4, 6, 18], "reco_arch": [8, 12, 14, 17], "reco_b": 18, "reco_model": [12, 14, 17], "reco_param": 12, "reco_predictor": 12, "recogn": 18, "recognit": [6, 10, 11, 12], "recognition_predictor": [8, 18], "recognition_task": [6, 16], "recognitiondataset": [6, 16], "recognitionpredictor": [8, 12], "rectangular": 8, "reduc": [3, 9], "refer": [2, 3, 12, 14, 15, 16, 18], "regardless": 1, "region": 18, "regroup": 10, "regular": 16, "reject": 1, "rel": [7, 9, 10, 18], "relat": 7, "releas": [0, 3], "relev": 15, "religion": 1, "remov": 1, "render": [7, 18], "repo": 8, "repo_id": [8, 14], "report": 1, "repositori": [6, 8, 14], "repres": [1, 17, 18], "represent": [4, 8], "request": [1, 14], "requir": [3, 9, 17], "research": 4, "residu": 8, "resiz": [9, 18], "resnet": 8, "resnet18": [8, 14], "resnet31": 8, "resnet34": 8, "resnet50": [8, 14], "resolv": 7, "resolve_block": 18, "resolve_lin": 18, "resourc": 16, "respect": 1, "rest": [2, 9, 10], "restrict": 13, "result": [2, 6, 7, 11, 14, 17, 18], "return": 18, "reusabl": 18, "review": 1, "rgb": [7, 9], "rgb_mode": 7, "rgb_output": 7, "right": [1, 8, 10], "robust": [4, 6], "root": 6, "rotat": [6, 7, 8, 9, 10, 11, 12, 16, 18], "run": [2, 3, 8], "same": [2, 7, 10, 16, 17, 18], "sampl": [6, 16, 18], "sample_transform": 6, "sar": [4, 8], "sar_resnet31": [8, 18], "satur": 9, "save": [8, 16], "scale": [7, 8, 9, 10], "scale_rang": 9, "scan": [4, 6], "scene": [4, 6, 8], "score": [7, 10], "script": [2, 16], "seamless": 4, "seamlessli": [4, 18], "search": 8, "searchabl": 11, "sec": 18, "second": 18, "section": [12, 14, 15, 17, 18], "secur": [1, 13], "see": [1, 2], "seen": 18, "segment": [4, 8, 18], "self": 18, "semant": [4, 8], "send": 18, "sens": 10, "sensit": 16, "separ": 18, "sequenc": [4, 6, 7, 8, 10, 18], "sequenti": [9, 18], "seri": 1, "seriou": 1, "set": [1, 3, 6, 8, 10, 13, 15, 18], "set_global_polici": 17, "sever": [7, 9, 18], "sex": 1, "sexual": 1, "shade": 9, "shape": [4, 7, 8, 9, 10, 18], "share": [13, 16], "shift": 9, "shm": 13, "should": [2, 6, 7, 9, 10], "show": [4, 7, 8, 10, 12, 14, 15], "showcas": [2, 11], "shuffl": [6, 9], "side": 10, "signatur": 7, "signific": 16, "simpl": [4, 8, 17], "simpler": 8, "sinc": [6, 16], "singl": [1, 2, 4, 6], "single_img_doc": 17, "size": [1, 6, 7, 9, 15, 18], "skew": 18, "slack": 2, "slightli": 8, "small": [2, 8, 18], "smallest": 7, "snapshot_download": 8, "snippet": 18, "so": [2, 3, 6, 8, 14, 16], "social": 1, "socio": 1, "some": [3, 11, 14, 16], "someth": 2, "somewher": 2, "sort": 1, "sourc": [6, 7, 8, 9, 10, 14], "space": [1, 18], "span": 18, "spanish": 6, "spatial": [4, 6, 7], "specif": [2, 3, 10, 12, 16, 18], "specifi": [1, 6, 7], "speed": [4, 8, 18], "sphinx": 2, "sroie": [4, 6, 16], "stabl": 3, "stackoverflow": 2, "stage": 4, "standalon": 11, "standard": 9, "start": 6, "state": [4, 10, 15], "static": 10, "statu": 1, "std": [9, 12], "step": 13, "still": 18, "str": [6, 7, 8, 9, 10], "straight": [6, 8, 16, 18], "straighten": 18, "straighten_pag": [8, 12, 18], "straigten_pag": 12, "stream": 7, "street": [4, 6], "strict": 3, "strictli": 10, "string": [6, 7, 10, 18], "strive": 3, "strong": [4, 8], "structur": [17, 18], "subset": [6, 18], "suggest": [2, 14], "sum": 10, "summari": 10, "support": [3, 12, 15, 17, 18], "sustain": 1, "svhn": [4, 6, 16], "svt": [6, 16], "swedish": 6, "symmetr": [8, 9, 18], "symmetric_pad": [8, 9, 18], "synthet": 4, "synthtext": [4, 6, 16], "system": 18, "t": [2, 6, 12, 17, 18], "tabl": [14, 15, 16], "take": [1, 6, 18], "target": [6, 7, 9, 10, 16], "target_s": 6, "task": [4, 6, 8, 14, 16, 18], "task2": 6, "team": 3, "techminde": 3, "templat": [2, 4], "tensor": [6, 7, 9, 18], "tensorflow": [3, 4, 7, 8, 9, 12, 14, 17, 18], "tensorspec": 17, "term": 1, "test": [6, 16], "test_set": 6, "text": [6, 7, 8, 10, 16], "text_output": 18, "textmatch": 10, "textnet": 8, "textnet_bas": 8, "textnet_smal": 8, "textnet_tini": 8, "textract": [4, 18], "textstylebrush": [4, 6], "textual": [4, 6, 7, 8, 18], "tf": [3, 7, 8, 9, 14, 17], "than": [2, 10, 14], "thank": 2, "thei": [1, 10], "them": [6, 18], "thi": [1, 2, 3, 5, 6, 9, 10, 12, 13, 14, 16, 17, 18], "thing": [17, 18], "third": 3, "those": [1, 7, 18], "threaten": 1, "threshold": 18, "through": [1, 9, 15, 16], "tilman": 14, "time": [1, 4, 8, 10, 16], "tini": 8, "titl": [7, 18], "tm": 18, "tmp": 13, "togeth": [2, 7], "tograi": 9, "tool": 16, "top": [10, 17, 18], "topic": 2, "torch": [3, 9, 12, 14, 17], "torchvis": 9, "total": 12, "toward": [1, 3], "train": [2, 6, 8, 9, 14, 15, 16, 17, 18], "train_it": [6, 16], "train_load": [6, 16], "train_pytorch": 14, "train_set": [6, 16], "train_tensorflow": 14, "trainabl": [4, 8], "tranform": 9, "transcrib": 18, "transfer": [4, 6], "transfo": 9, "transform": [4, 6, 8], "translat": 1, "troll": 1, "true": [6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18], "truth": 10, "tune": 17, "tupl": [6, 7, 9, 10], "two": [7, 13], "txt": 6, "type": [7, 10, 14, 17, 18], "typic": 18, "u": [1, 2], "ucsd": 6, "udac": 2, "uint8": [7, 8, 10, 18], "ukrainian": 6, "unaccept": 1, "underli": [16, 18], "underneath": 7, "understand": [4, 6, 18], "uniform": [8, 9], "uniformli": 9, "uninterrupt": [7, 18], "union": 10, "unittest": 2, "unlock": 7, "unoffici": 8, "unprofession": 1, "unsolicit": 1, "unsupervis": 4, "unwelcom": 1, "up": [8, 18], "updat": 10, "upgrad": 2, "upper": [6, 9], "uppercas": 16, "url": 7, "us": [1, 2, 3, 6, 8, 10, 11, 12, 13, 14, 15, 18], "usabl": 18, "usag": [13, 17], "use_polygon": [6, 10, 16], "useabl": 18, "user": [4, 7, 11], "utf": 18, "util": 17, "v1": 14, "v3": [8, 14, 18], "valid": 16, "valu": [2, 7, 9, 18], "valuabl": 4, "variabl": 13, "varieti": 6, "veri": 8, "version": [1, 2, 3, 17, 18], "vgg": 8, "vgg16": 14, "vgg16_bn_r": 8, "via": 1, "vietnames": 6, "view": [4, 6], "viewpoint": 1, "violat": 1, "visibl": 1, "vision": [4, 6, 8], "visiondataset": 6, "visiontransform": 8, "visual": [3, 4, 15], "visualize_pag": 10, "vit_": 8, "vit_b": 8, "vitstr": [4, 8, 17], "vitstr_bas": [8, 18], "vitstr_smal": [8, 12, 17, 18], "viz": 3, "vocab": [12, 14, 16, 17, 18], "vocabulari": [6, 12, 14], "w": [7, 8, 9, 10], "w3": 18, "wa": 1, "wai": [1, 4, 16], "want": [2, 17, 18], "warmup": 18, "wasn": 2, "we": [1, 2, 3, 4, 7, 9, 12, 14, 16, 17, 18], "weasyprint": 7, "web": [2, 7], "websit": 6, "welcom": 1, "well": [1, 17], "were": [1, 7, 18], "what": 1, "when": [1, 2, 8], "whenev": 2, "where": [2, 7, 9, 10], "whether": [2, 6, 7, 9, 10, 16, 18], "which": [1, 8, 13, 15, 16, 18], "whichev": 3, "while": [9, 18], "why": 1, "width": [7, 9], "wiki": 1, "wildreceipt": [4, 6, 16], "window": [8, 10], "wish": 2, "within": 1, "without": [1, 6, 8], "wonder": 2, "word": [4, 6, 8, 10, 18], "word_1_1": 18, "word_1_2": 18, "word_1_3": 18, "wordgener": [6, 16], "words_onli": 10, "work": [12, 13, 18], "workflow": 2, "worklow": 2, "world": [10, 18], "worth": 8, "wrap": 18, "wrapper": [6, 9], "write": 13, "written": [1, 7], "www": [1, 7, 18], "x": [7, 9, 10], "x_ascend": 18, "x_descend": 18, "x_i": 10, "x_size": 18, "x_wconf": 18, "xhtml": 18, "xmax": 7, "xmin": 7, "xml": 18, "xml_bytes_str": 18, "xml_element": 18, "xml_output": 18, "xmln": 18, "y": 10, "y_i": 10, "y_j": 10, "yet": 15, "ymax": 7, "ymin": 7, "yolov8": 15, "you": [2, 3, 6, 7, 8, 12, 13, 14, 15, 16, 17, 18], "your": [2, 4, 7, 10, 18], "yoursit": 7, "zero": [9, 10], "zoo": 12, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 6, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 6, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 6, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 6, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 6, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 6, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 6, "\u00e4\u00f6\u00e4\u00f6": 6, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 6, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 6, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 6, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 6, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 6, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 6, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0905": 6, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 6, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 6, "\u0950": 6, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 6, "\u09bd": 6, "\u09ce": 6, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 6}, "titles": ["Changelog", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 2, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 1], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 1], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 1], "31": 0, "4": [0, 1], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 18, "approach": 18, "architectur": 18, "arg": [6, 7, 8, 9, 10], "artefact": 7, "artefactdetect": 15, "attribut": 1, "avail": [15, 16, 18], "aw": 13, "ban": 1, "block": 7, "bug": 2, "changelog": 0, "choos": [16, 18], "classif": [8, 12, 14], "code": [1, 2], "codebas": 2, "commit": 2, "commun": 14, "compos": 9, "conda": 3, "conduct": 1, "connect": 2, "continu": 2, "contrib": 5, "contribut": [2, 5, 15], "contributor": 1, "convent": 14, "correct": 1, "coven": 1, "custom": [6, 12], "data": 16, "dataload": 6, "dataset": [4, 6, 16], "detect": [4, 8, 14, 16, 18], "develop": 2, "do": 18, "doctr": [2, 4, 5, 6, 7, 8, 9, 10, 11], "document": [2, 4, 7], "end": 18, "enforc": 1, "evalu": 10, "export": 17, "factori": 8, "featur": [2, 4], "feedback": 2, "file": 7, "from": 14, "gener": [6, 16], "git": 3, "guidelin": 1, "half": 17, "hub": 14, "huggingfac": 14, "i": 18, "infer": 17, "instal": [2, 3], "integr": [2, 15], "io": 7, "lambda": 13, "let": 2, "line": 7, "linux": 3, "load": [12, 14, 16], "loader": 6, "main": 4, "mode": 2, "model": [4, 8, 12, 14, 17, 18], "modifi": 2, "modul": [5, 15], "name": 14, "notebook": 11, "object": 16, "ocr": [16, 18], "onli": 3, "onnx": 17, "optim": 17, "option": 18, "orient": 12, "our": 1, "output": 18, "own": [12, 16], "packag": 3, "page": 7, "perman": 1, "pipelin": 15, "pledg": 1, "precis": 17, "predictor": 18, "prepar": 17, "prerequisit": 3, "pretrain": 14, "push": 14, "python": 3, "qualiti": 2, "question": 2, "read": 7, "readi": 16, "recognit": [4, 8, 14, 16, 18], "report": 2, "request": 2, "respons": 1, "return": [6, 7, 8, 10], "right": 18, "scope": 1, "share": 14, "should": 18, "stage": 18, "standard": 1, "structur": [2, 7], "style": 2, "support": [4, 5, 6, 9], "synthet": [6, 16], "task": 10, "temporari": 1, "test": 2, "text": [4, 18], "train": 12, "transform": 9, "two": 18, "unit": 2, "us": [16, 17], "util": 10, "v0": 0, "verif": 2, "via": 3, "visual": 10, "vocab": 6, "warn": 1, "what": 18, "word": 7, "your": [12, 14, 15, 16, 17], "zoo": [4, 8]}})
\ No newline at end of file
diff --git a/using_doctr/custom_models_training.html b/using_doctr/custom_models_training.html
index 580b4368b7..e664c6a950 100644
--- a/using_doctr/custom_models_training.html
+++ b/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -615,7 +615,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/using_doctr/running_on_aws.html b/using_doctr/running_on_aws.html
index ddb0c3c80f..81c38b49f5 100644
--- a/using_doctr/running_on_aws.html
+++ b/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -358,7 +358,7 @@ AWS Lambda
-
+
diff --git a/using_doctr/sharing_models.html b/using_doctr/sharing_models.html
index 07a3b2f2a3..4f5d1d68a5 100644
--- a/using_doctr/sharing_models.html
+++ b/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -540,7 +540,7 @@ Recognition
-
+
diff --git a/using_doctr/using_contrib_modules.html b/using_doctr/using_contrib_modules.html
index b4a10925e6..cf282ff3a4 100644
--- a/using_doctr/using_contrib_modules.html
+++ b/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -411,7 +411,7 @@ ArtefactDetection
-
+
diff --git a/using_doctr/using_datasets.html b/using_doctr/using_datasets.html
index 4a52df36ba..e30b6d6459 100644
--- a/using_doctr/using_datasets.html
+++ b/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -638,7 +638,7 @@ Data Loading
-
+
diff --git a/using_doctr/using_model_export.html b/using_doctr/using_model_export.html
index 2b30ee63a1..ad9d09ed4c 100644
--- a/using_doctr/using_model_export.html
+++ b/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -463,7 +463,7 @@ Using your ONNX exported model
-
+
diff --git a/using_doctr/using_models.html b/using_doctr/using_models.html
index 13cb06116b..5c80dbf62d 100644
--- a/using_doctr/using_models.html
+++ b/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1249,7 +1249,7 @@ Advanced options
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/cord.html b/v0.1.0/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.0/_modules/doctr/datasets/cord.html
+++ b/v0.1.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/detection.html b/v0.1.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.0/_modules/doctr/datasets/detection.html
+++ b/v0.1.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/doc_artefacts.html b/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/funsd.html b/v0.1.0/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.0/_modules/doctr/datasets/funsd.html
+++ b/v0.1.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ic03.html b/v0.1.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.0/_modules/doctr/datasets/ic03.html
+++ b/v0.1.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ic13.html b/v0.1.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.0/_modules/doctr/datasets/ic13.html
+++ b/v0.1.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/iiit5k.html b/v0.1.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/iiithws.html b/v0.1.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/imgur5k.html b/v0.1.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/loader.html b/v0.1.0/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.0/_modules/doctr/datasets/loader.html
+++ b/v0.1.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/mjsynth.html b/v0.1.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ocr.html b/v0.1.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.0/_modules/doctr/datasets/ocr.html
+++ b/v0.1.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/recognition.html b/v0.1.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.0/_modules/doctr/datasets/recognition.html
+++ b/v0.1.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/sroie.html b/v0.1.0/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.0/_modules/doctr/datasets/sroie.html
+++ b/v0.1.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/svhn.html b/v0.1.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.0/_modules/doctr/datasets/svhn.html
+++ b/v0.1.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/svt.html b/v0.1.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.0/_modules/doctr/datasets/svt.html
+++ b/v0.1.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/synthtext.html b/v0.1.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/utils.html b/v0.1.0/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.0/_modules/doctr/datasets/utils.html
+++ b/v0.1.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/wildreceipt.html b/v0.1.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.0/_modules/doctr/io/elements.html b/v0.1.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.0/_modules/doctr/io/elements.html
+++ b/v0.1.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.0/_modules/doctr/io/html.html b/v0.1.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.0/_modules/doctr/io/html.html
+++ b/v0.1.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.0/_modules/doctr/io/image/base.html b/v0.1.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.0/_modules/doctr/io/image/base.html
+++ b/v0.1.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.0/_modules/doctr/io/image/tensorflow.html b/v0.1.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/io/pdf.html b/v0.1.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.0/_modules/doctr/io/pdf.html
+++ b/v0.1.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.0/_modules/doctr/io/reader.html b/v0.1.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.0/_modules/doctr/io/reader.html
+++ b/v0.1.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/zoo.html b/v0.1.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/zoo.html b/v0.1.0/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/models/factory/hub.html b/v0.1.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.0/_modules/doctr/models/factory/hub.html
+++ b/v0.1.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/zoo.html b/v0.1.0/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/models/zoo.html b/v0.1.0/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.0/_modules/doctr/models/zoo.html
+++ b/v0.1.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/transforms/modules/base.html b/v0.1.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/utils/metrics.html b/v0.1.0/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.0/_modules/doctr/utils/metrics.html
+++ b/v0.1.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.0/_modules/doctr/utils/visualization.html b/v0.1.0/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.0/_modules/doctr/utils/visualization.html
+++ b/v0.1.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.0/_modules/index.html b/v0.1.0/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.0/_modules/index.html
+++ b/v0.1.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.0/_sources/getting_started/installing.rst.txt b/v0.1.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.0/_sources/getting_started/installing.rst.txt
+++ b/v0.1.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.0/_static/basic.css b/v0.1.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.0/_static/basic.css
+++ b/v0.1.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.0/_static/doctools.js b/v0.1.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.0/_static/doctools.js
+++ b/v0.1.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.0/_static/language_data.js b/v0.1.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.0/_static/language_data.js
+++ b/v0.1.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.0/_static/searchtools.js b/v0.1.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.0/_static/searchtools.js
+++ b/v0.1.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.0/changelog.html b/v0.1.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.0/changelog.html
+++ b/v0.1.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.0/community/resources.html b/v0.1.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.0/community/resources.html
+++ b/v0.1.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.0/contributing/code_of_conduct.html b/v0.1.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.0/contributing/code_of_conduct.html
+++ b/v0.1.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.0/contributing/contributing.html b/v0.1.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.0/contributing/contributing.html
+++ b/v0.1.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.0/genindex.html b/v0.1.0/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.0/genindex.html
+++ b/v0.1.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.0/getting_started/installing.html b/v0.1.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.0/getting_started/installing.html
+++ b/v0.1.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.0/index.html b/v0.1.0/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.0/index.html
+++ b/v0.1.0/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.0/modules/contrib.html b/v0.1.0/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.0/modules/contrib.html
+++ b/v0.1.0/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.0/modules/datasets.html b/v0.1.0/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.0/modules/datasets.html
+++ b/v0.1.0/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.0/modules/io.html b/v0.1.0/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.0/modules/io.html
+++ b/v0.1.0/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.0/modules/models.html b/v0.1.0/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.0/modules/models.html
+++ b/v0.1.0/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.0/modules/transforms.html b/v0.1.0/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.0/modules/transforms.html
+++ b/v0.1.0/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.0/modules/utils.html b/v0.1.0/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.0/modules/utils.html
+++ b/v0.1.0/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.0/notebooks.html b/v0.1.0/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.0/notebooks.html
+++ b/v0.1.0/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.0/search.html b/v0.1.0/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.0/search.html
+++ b/v0.1.0/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.0/searchindex.js b/v0.1.0/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.0/searchindex.js
+++ b/v0.1.0/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.0/using_doctr/custom_models_training.html b/v0.1.0/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.0/using_doctr/custom_models_training.html
+++ b/v0.1.0/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.0/using_doctr/running_on_aws.html b/v0.1.0/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.0/using_doctr/running_on_aws.html
+++ b/v0.1.0/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.0/using_doctr/sharing_models.html b/v0.1.0/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.0/using_doctr/sharing_models.html
+++ b/v0.1.0/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.0/using_doctr/using_contrib_modules.html b/v0.1.0/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.0/using_doctr/using_contrib_modules.html
+++ b/v0.1.0/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.0/using_doctr/using_datasets.html b/v0.1.0/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.0/using_doctr/using_datasets.html
+++ b/v0.1.0/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.0/using_doctr/using_model_export.html b/v0.1.0/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.0/using_doctr/using_model_export.html
+++ b/v0.1.0/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.0/using_doctr/using_models.html b/v0.1.0/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.0/using_doctr/using_models.html
+++ b/v0.1.0/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/cord.html b/v0.1.1/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.1/_modules/doctr/datasets/cord.html
+++ b/v0.1.1/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/detection.html b/v0.1.1/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.1/_modules/doctr/datasets/detection.html
+++ b/v0.1.1/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/funsd.html b/v0.1.1/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.1/_modules/doctr/datasets/funsd.html
+++ b/v0.1.1/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic03.html b/v0.1.1/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.1/_modules/doctr/datasets/ic03.html
+++ b/v0.1.1/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic13.html b/v0.1.1/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.1/_modules/doctr/datasets/ic13.html
+++ b/v0.1.1/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiit5k.html b/v0.1.1/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.1/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.1/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiithws.html b/v0.1.1/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.1/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.1/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/imgur5k.html b/v0.1.1/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.1/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.1/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/loader.html b/v0.1.1/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.1/_modules/doctr/datasets/loader.html
+++ b/v0.1.1/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/mjsynth.html b/v0.1.1/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.1/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.1/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ocr.html b/v0.1.1/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.1/_modules/doctr/datasets/ocr.html
+++ b/v0.1.1/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/recognition.html b/v0.1.1/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.1/_modules/doctr/datasets/recognition.html
+++ b/v0.1.1/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/sroie.html b/v0.1.1/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.1/_modules/doctr/datasets/sroie.html
+++ b/v0.1.1/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svhn.html b/v0.1.1/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.1/_modules/doctr/datasets/svhn.html
+++ b/v0.1.1/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svt.html b/v0.1.1/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.1/_modules/doctr/datasets/svt.html
+++ b/v0.1.1/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/synthtext.html b/v0.1.1/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.1/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.1/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/utils.html b/v0.1.1/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.1/_modules/doctr/datasets/utils.html
+++ b/v0.1.1/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/wildreceipt.html b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.1/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.1/_modules/doctr/io/elements.html b/v0.1.1/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.1/_modules/doctr/io/elements.html
+++ b/v0.1.1/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.1/_modules/doctr/io/html.html b/v0.1.1/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.1/_modules/doctr/io/html.html
+++ b/v0.1.1/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/base.html b/v0.1.1/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.1/_modules/doctr/io/image/base.html
+++ b/v0.1.1/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/tensorflow.html b/v0.1.1/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.1/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.1/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/io/pdf.html b/v0.1.1/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.1/_modules/doctr/io/pdf.html
+++ b/v0.1.1/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.1/_modules/doctr/io/reader.html b/v0.1.1/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.1/_modules/doctr/io/reader.html
+++ b/v0.1.1/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/zoo.html b/v0.1.1/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.1/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.1/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/zoo.html b/v0.1.1/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.1/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.1/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/factory/hub.html b/v0.1.1/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.1/_modules/doctr/models/factory/hub.html
+++ b/v0.1.1/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/zoo.html b/v0.1.1/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.1/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.1/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/zoo.html b/v0.1.1/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.1/_modules/doctr/models/zoo.html
+++ b/v0.1.1/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/base.html b/v0.1.1/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/utils/metrics.html b/v0.1.1/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.1/_modules/doctr/utils/metrics.html
+++ b/v0.1.1/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.1/_modules/doctr/utils/visualization.html b/v0.1.1/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.1/_modules/doctr/utils/visualization.html
+++ b/v0.1.1/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.1/_modules/index.html b/v0.1.1/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.1/_modules/index.html
+++ b/v0.1.1/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.1/_sources/getting_started/installing.rst.txt b/v0.1.1/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.1/_sources/getting_started/installing.rst.txt
+++ b/v0.1.1/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.1/_static/basic.css b/v0.1.1/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.1/_static/basic.css
+++ b/v0.1.1/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.1/_static/doctools.js b/v0.1.1/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.1/_static/doctools.js
+++ b/v0.1.1/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.1/_static/language_data.js b/v0.1.1/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.1/_static/language_data.js
+++ b/v0.1.1/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.1/_static/searchtools.js b/v0.1.1/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.1/_static/searchtools.js
+++ b/v0.1.1/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.1/changelog.html b/v0.1.1/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.1/changelog.html
+++ b/v0.1.1/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.1/community/resources.html b/v0.1.1/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.1/community/resources.html
+++ b/v0.1.1/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.1/contributing/code_of_conduct.html b/v0.1.1/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.1/contributing/code_of_conduct.html
+++ b/v0.1.1/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.1/contributing/contributing.html b/v0.1.1/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.1/contributing/contributing.html
+++ b/v0.1.1/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.1/genindex.html b/v0.1.1/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.1/genindex.html
+++ b/v0.1.1/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.1/getting_started/installing.html b/v0.1.1/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.1/getting_started/installing.html
+++ b/v0.1.1/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.1/index.html b/v0.1.1/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.1/index.html
+++ b/v0.1.1/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.1/modules/contrib.html b/v0.1.1/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.1/modules/contrib.html
+++ b/v0.1.1/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.1/modules/datasets.html b/v0.1.1/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.1/modules/datasets.html
+++ b/v0.1.1/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.1/modules/io.html b/v0.1.1/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.1/modules/io.html
+++ b/v0.1.1/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.1/modules/models.html b/v0.1.1/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.1/modules/models.html
+++ b/v0.1.1/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.1/modules/transforms.html b/v0.1.1/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.1/modules/transforms.html
+++ b/v0.1.1/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
Args:¶<
-
+
diff --git a/latest/modules/utils.html b/latest/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/latest/modules/utils.html
+++ b/latest/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/latest/notebooks.html b/latest/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/latest/notebooks.html
+++ b/latest/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/latest/search.html b/latest/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/latest/search.html
+++ b/latest/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/latest/searchindex.js b/latest/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/latest/searchindex.js
+++ b/latest/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/latest/using_doctr/custom_models_training.html b/latest/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/latest/using_doctr/custom_models_training.html
+++ b/latest/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/latest/using_doctr/running_on_aws.html b/latest/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/latest/using_doctr/running_on_aws.html
+++ b/latest/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/latest/using_doctr/sharing_models.html b/latest/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/latest/using_doctr/sharing_models.html
+++ b/latest/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/latest/using_doctr/using_contrib_modules.html b/latest/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/latest/using_doctr/using_contrib_modules.html
+++ b/latest/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/latest/using_doctr/using_datasets.html b/latest/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/latest/using_doctr/using_datasets.html
+++ b/latest/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/latest/using_doctr/using_model_export.html b/latest/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/latest/using_doctr/using_model_export.html
+++ b/latest/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/latest/using_doctr/using_models.html b/latest/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/latest/using_doctr/using_models.html
+++ b/latest/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/modules/contrib.html b/modules/contrib.html
index 22b0c508a6..b8878635b6 100644
--- a/modules/contrib.html
+++ b/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -376,7 +376,7 @@ Supported contribution modules
-
+
diff --git a/modules/datasets.html b/modules/datasets.html
index 0fe4b78d48..dfcacbc96e 100644
--- a/modules/datasets.html
+++ b/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1077,7 +1077,7 @@ Returns:
-
+
diff --git a/modules/io.html b/modules/io.html
index 924d292c59..77e9e017bf 100644
--- a/modules/io.html
+++ b/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -756,7 +756,7 @@ Returns:¶
-
+
diff --git a/modules/models.html b/modules/models.html
index bf45d11a71..f4a9833365 100644
--- a/modules/models.html
+++ b/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1598,7 +1598,7 @@ Args:¶
-
+
diff --git a/modules/transforms.html b/modules/transforms.html
index 6d77d16e7b..bc254c867b 100644
--- a/modules/transforms.html
+++ b/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -831,7 +831,7 @@ Args:¶<
-
+
diff --git a/modules/utils.html b/modules/utils.html
index 3dd3ecbd96..6784d81f6f 100644
--- a/modules/utils.html
+++ b/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -711,7 +711,7 @@ Args:¶
-
+
diff --git a/notebooks.html b/notebooks.html
index f3ea994e49..647f73d4eb 100644
--- a/notebooks.html
+++ b/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -387,7 +387,7 @@ docTR Notebooks
-
+
diff --git a/search.html b/search.html
index f0693e2c97..0e0da5efb3 100644
--- a/search.html
+++ b/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -336,7 +336,7 @@
-
+
diff --git a/searchindex.js b/searchindex.js
index 8598997441..df18967072 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[1, "correction"]], "2. Warning": [[1, "warning"]], "3. Temporary Ban": [[1, "temporary-ban"]], "4. Permanent Ban": [[1, "permanent-ban"]], "AWS Lambda": [[13, null]], "Advanced options": [[18, "advanced-options"]], "Args:": [[6, "args"], [6, "id4"], [6, "id7"], [6, "id10"], [6, "id13"], [6, "id16"], [6, "id19"], [6, "id22"], [6, "id25"], [6, "id29"], [6, "id32"], [6, "id37"], [6, "id40"], [6, "id46"], [6, "id49"], [6, "id50"], [6, "id51"], [6, "id54"], [6, "id57"], [6, "id60"], [6, "id61"], [7, "args"], [7, "id2"], [7, "id3"], [7, "id4"], [7, "id5"], [7, "id6"], [7, "id7"], [7, "id10"], [7, "id12"], [7, "id14"], [7, "id16"], [7, "id20"], [7, "id24"], [7, "id28"], [8, "args"], [8, "id3"], [8, "id8"], [8, "id13"], [8, "id17"], [8, "id21"], [8, "id26"], [8, "id31"], [8, "id36"], [8, "id41"], [8, "id46"], [8, "id50"], [8, "id54"], [8, "id59"], [8, "id63"], [8, "id68"], [8, "id73"], [8, "id77"], [8, "id81"], [8, "id85"], [8, "id90"], [8, "id95"], [8, "id99"], [8, "id104"], [8, "id109"], [8, "id114"], [8, "id119"], [8, "id123"], [8, "id127"], [8, "id132"], [8, "id137"], [8, "id142"], [8, "id146"], [8, "id150"], [8, "id155"], [8, "id159"], [8, "id163"], [8, "id167"], [8, "id169"], [8, "id171"], [8, "id173"], [9, "args"], [9, "id1"], [9, "id2"], [9, "id3"], [9, "id4"], [9, "id5"], [9, "id6"], [9, "id7"], [9, "id8"], [9, "id9"], [9, "id10"], [9, "id11"], [9, "id12"], [9, "id13"], [9, "id14"], [9, "id15"], [9, "id16"], [9, "id17"], [9, "id18"], [9, "id19"], [10, "args"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"]], "Artefact": [[7, "artefact"]], "ArtefactDetection": [[15, "artefactdetection"]], "Attribution": [[1, "attribution"]], "Available Datasets": [[16, "available-datasets"]], "Available architectures": [[18, "available-architectures"], [18, "id1"], [18, "id2"]], "Available contribution modules": [[15, "available-contribution-modules"]], "Block": [[7, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[16, null]], "Choosing the right model": [[18, null]], "Classification": [[14, "classification"]], "Code quality": [[2, "code-quality"]], "Code style verification": [[2, "code-style-verification"]], "Codebase structure": [[2, "codebase-structure"]], "Commits": [[2, "commits"]], "Composing transformations": [[9, "composing-transformations"]], "Continuous Integration": [[2, "continuous-integration"]], "Contributing to docTR": [[2, null]], "Contributor Covenant Code of Conduct": [[1, null]], "Custom dataset loader": [[6, "custom-dataset-loader"]], "Custom orientation classification models": [[12, "custom-orientation-classification-models"]], "Data Loading": [[16, "data-loading"]], "Dataloader": [[6, "dataloader"]], "Detection": [[14, "detection"], [16, "detection"]], "Detection predictors": [[18, "detection-predictors"]], "Developer mode installation": [[2, "developer-mode-installation"]], "Developing docTR": [[2, "developing-doctr"]], "Document": [[7, "document"]], "Document structure": [[7, "document-structure"]], "End-to-End OCR": [[18, "end-to-end-ocr"]], "Enforcement": [[1, "enforcement"]], "Enforcement Guidelines": [[1, "enforcement-guidelines"]], "Enforcement Responsibilities": [[1, "enforcement-responsibilities"]], "Export to ONNX": [[17, "export-to-onnx"]], "Feature requests & bug report": [[2, "feature-requests-bug-report"]], "Feedback": [[2, "feedback"]], "File reading": [[7, "file-reading"]], "Half-precision": [[17, "half-precision"]], "Installation": [[3, null]], "Integrate contributions into your pipeline": [[15, null]], "Let\u2019s connect": [[2, "let-s-connect"]], "Line": [[7, "line"]], "Loading from Huggingface Hub": [[14, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[12, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[12, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[4, "main-features"]], "Model optimization": [[17, "model-optimization"]], "Model zoo": [[4, "model-zoo"]], "Modifying the documentation": [[2, "modifying-the-documentation"]], "Naming conventions": [[14, "naming-conventions"]], "OCR": [[16, "ocr"]], "Object Detection": [[16, "object-detection"]], "Our Pledge": [[1, "our-pledge"]], "Our Standards": [[1, "our-standards"]], "Page": [[7, "page"]], "Preparing your model for inference": [[17, null]], "Prerequisites": [[3, "prerequisites"]], "Pretrained community models": [[14, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[14, "pushing-to-the-huggingface-hub"]], "Questions": [[2, "questions"]], "Recognition": [[14, "recognition"], [16, "recognition"]], "Recognition predictors": [[18, "recognition-predictors"]], "Returns:": [[6, "returns"], [7, "returns"], [7, "id11"], [7, "id13"], [7, "id15"], [7, "id19"], [7, "id23"], [7, "id27"], [7, "id31"], [8, "returns"], [8, "id6"], [8, "id11"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id29"], [8, "id34"], [8, "id39"], [8, "id44"], [8, "id49"], [8, "id53"], [8, "id57"], [8, "id62"], [8, "id66"], [8, "id71"], [8, "id76"], [8, "id80"], [8, "id84"], [8, "id88"], [8, "id93"], [8, "id98"], [8, "id102"], [8, "id107"], [8, "id112"], [8, "id117"], [8, "id122"], [8, "id126"], [8, "id130"], [8, "id135"], [8, "id140"], [8, "id145"], [8, "id149"], [8, "id153"], [8, "id158"], [8, "id162"], [8, "id166"], [8, "id168"], [8, "id170"], [8, "id172"], [10, "returns"]], "Scope": [[1, "scope"]], "Share your model with the community": [[14, null]], "Supported Vocabs": [[6, "supported-vocabs"]], "Supported contribution modules": [[5, "supported-contribution-modules"]], "Supported datasets": [[4, "supported-datasets"]], "Supported transformations": [[9, "supported-transformations"]], "Synthetic dataset generator": [[6, "synthetic-dataset-generator"], [16, "synthetic-dataset-generator"]], "Task evaluation": [[10, "task-evaluation"]], "Text Detection": [[18, "text-detection"]], "Text Recognition": [[18, "text-recognition"]], "Text detection models": [[4, "text-detection-models"]], "Text recognition models": [[4, "text-recognition-models"]], "Train your own model": [[12, null]], "Two-stage approaches": [[18, "two-stage-approaches"]], "Unit tests": [[2, "unit-tests"]], "Use your own datasets": [[16, "use-your-own-datasets"]], "Using your ONNX exported model": [[17, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[3, "via-conda-only-for-linux"]], "Via Git": [[3, "via-git"]], "Via Python Package": [[3, "via-python-package"]], "Visualization": [[10, "visualization"]], "What should I do with the output?": [[18, "what-should-i-do-with-the-output"]], "Word": [[7, "word"]], "docTR Notebooks": [[11, null]], "docTR Vocabs": [[6, "id62"]], "docTR: Document Text Recognition": [[4, null]], "doctr.contrib": [[5, null]], "doctr.datasets": [[6, null], [6, "datasets"]], "doctr.io": [[7, null]], "doctr.models": [[8, null]], "doctr.models.classification": [[8, "doctr-models-classification"]], "doctr.models.detection": [[8, "doctr-models-detection"]], "doctr.models.factory": [[8, "doctr-models-factory"]], "doctr.models.recognition": [[8, "doctr-models-recognition"]], "doctr.models.zoo": [[8, "doctr-models-zoo"]], "doctr.transforms": [[9, null]], "doctr.utils": [[10, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[7, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[7, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[9, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[6, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[9, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[9, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[6, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[6, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[8, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[6, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[6, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[7, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[7, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[6, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[6, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[9, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[9, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[6, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[6, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[6, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[6, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[6, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[8, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[9, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[7, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[6, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[9, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[8, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[6, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[9, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[7, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[9, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[9, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[9, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[9, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[9, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[9, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[9, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[9, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[9, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[9, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[9, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[9, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[7, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[7, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[7, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[6, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[9, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[7, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[7, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[6, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[6, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[6, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[6, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[9, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[10, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[6, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[7, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[6, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[6, 0, 1, "", "CORD"], [6, 0, 1, "", "CharacterGenerator"], [6, 0, 1, "", "DetectionDataset"], [6, 0, 1, "", "DocArtefacts"], [6, 0, 1, "", "FUNSD"], [6, 0, 1, "", "IC03"], [6, 0, 1, "", "IC13"], [6, 0, 1, "", "IIIT5K"], [6, 0, 1, "", "IIITHWS"], [6, 0, 1, "", "IMGUR5K"], [6, 0, 1, "", "MJSynth"], [6, 0, 1, "", "OCRDataset"], [6, 0, 1, "", "RecognitionDataset"], [6, 0, 1, "", "SROIE"], [6, 0, 1, "", "SVHN"], [6, 0, 1, "", "SVT"], [6, 0, 1, "", "SynthText"], [6, 0, 1, "", "WILDRECEIPT"], [6, 0, 1, "", "WordGenerator"], [6, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[6, 0, 1, "", "DataLoader"]], "doctr.io": [[7, 0, 1, "", "Artefact"], [7, 0, 1, "", "Block"], [7, 0, 1, "", "Document"], [7, 0, 1, "", "DocumentFile"], [7, 0, 1, "", "Line"], [7, 0, 1, "", "Page"], [7, 0, 1, "", "Word"], [7, 1, 1, "", "decode_img_as_tensor"], [7, 1, 1, "", "read_html"], [7, 1, 1, "", "read_img_as_numpy"], [7, 1, 1, "", "read_img_as_tensor"], [7, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[7, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[7, 2, 1, "", "from_images"], [7, 2, 1, "", "from_pdf"], [7, 2, 1, "", "from_url"]], "doctr.io.Page": [[7, 2, 1, "", "show"]], "doctr.models": [[8, 1, 1, "", "kie_predictor"], [8, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[8, 1, 1, "", "crop_orientation_predictor"], [8, 1, 1, "", "magc_resnet31"], [8, 1, 1, "", "mobilenet_v3_large"], [8, 1, 1, "", "mobilenet_v3_large_r"], [8, 1, 1, "", "mobilenet_v3_small"], [8, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [8, 1, 1, "", "mobilenet_v3_small_page_orientation"], [8, 1, 1, "", "mobilenet_v3_small_r"], [8, 1, 1, "", "page_orientation_predictor"], [8, 1, 1, "", "resnet18"], [8, 1, 1, "", "resnet31"], [8, 1, 1, "", "resnet34"], [8, 1, 1, "", "resnet50"], [8, 1, 1, "", "textnet_base"], [8, 1, 1, "", "textnet_small"], [8, 1, 1, "", "textnet_tiny"], [8, 1, 1, "", "vgg16_bn_r"], [8, 1, 1, "", "vit_b"], [8, 1, 1, "", "vit_s"]], "doctr.models.detection": [[8, 1, 1, "", "db_mobilenet_v3_large"], [8, 1, 1, "", "db_resnet50"], [8, 1, 1, "", "detection_predictor"], [8, 1, 1, "", "fast_base"], [8, 1, 1, "", "fast_small"], [8, 1, 1, "", "fast_tiny"], [8, 1, 1, "", "linknet_resnet18"], [8, 1, 1, "", "linknet_resnet34"], [8, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[8, 1, 1, "", "from_hub"], [8, 1, 1, "", "login_to_hub"], [8, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[8, 1, 1, "", "crnn_mobilenet_v3_large"], [8, 1, 1, "", "crnn_mobilenet_v3_small"], [8, 1, 1, "", "crnn_vgg16_bn"], [8, 1, 1, "", "master"], [8, 1, 1, "", "parseq"], [8, 1, 1, "", "recognition_predictor"], [8, 1, 1, "", "sar_resnet31"], [8, 1, 1, "", "vitstr_base"], [8, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[9, 0, 1, "", "ChannelShuffle"], [9, 0, 1, "", "ColorInversion"], [9, 0, 1, "", "Compose"], [9, 0, 1, "", "GaussianBlur"], [9, 0, 1, "", "GaussianNoise"], [9, 0, 1, "", "LambdaTransformation"], [9, 0, 1, "", "Normalize"], [9, 0, 1, "", "OneOf"], [9, 0, 1, "", "RandomApply"], [9, 0, 1, "", "RandomBrightness"], [9, 0, 1, "", "RandomContrast"], [9, 0, 1, "", "RandomCrop"], [9, 0, 1, "", "RandomGamma"], [9, 0, 1, "", "RandomHorizontalFlip"], [9, 0, 1, "", "RandomHue"], [9, 0, 1, "", "RandomJpegQuality"], [9, 0, 1, "", "RandomResize"], [9, 0, 1, "", "RandomRotate"], [9, 0, 1, "", "RandomSaturation"], [9, 0, 1, "", "RandomShadow"], [9, 0, 1, "", "Resize"], [9, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[10, 0, 1, "", "DetectionMetric"], [10, 0, 1, "", "LocalizationConfusion"], [10, 0, 1, "", "OCRMetric"], [10, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.visualization": [[10, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [1, 7, 8, 10, 14, 17], "0": [1, 3, 6, 9, 10, 12, 15, 16, 18], "00": 18, "01": 18, "0123456789": 6, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "02562": 8, "03": 18, "035": 18, "0361328125": 18, "04": 18, "05": 18, "06": 18, "06640625": 18, "07": 18, "08": [9, 18], "09": 18, "0966796875": 18, "1": [6, 7, 8, 9, 10, 12, 16, 18], "10": [6, 10, 18], "100": [6, 9, 10, 16, 18], "1000": 18, "101": 6, "1024": [8, 12, 18], "104": 6, "106": 6, "108": 6, "1095": 16, "11": 18, "110": 10, "1107": 16, "114": 6, "115": 6, "1156": 16, "116": 6, "118": 6, "11800h": 18, "11th": 18, "12": 18, "120": 6, "123": 6, "126": 6, "1268": 16, "128": [8, 12, 17, 18], "13": 18, "130": 6, "13068": 16, "131": 6, "1337891": 16, "1357421875": 18, "1396484375": 18, "14": 18, "1420": 18, "14470v1": 6, "149": 16, "15": 18, "150": [10, 18], "1552": 18, "16": [8, 17, 18], "1630859375": 18, "1684": 18, "16x16": 8, "17": 18, "1778": 18, "1782": 18, "18": [8, 18], "185546875": 18, "1900": 18, "1910": 8, "19342": 16, "19370": 16, "195": 6, "19598": 16, "199": 18, "1999": 18, "2": [3, 4, 6, 7, 9, 15, 18], "20": 18, "200": 10, "2000": 16, "2003": [4, 6], "2012": 6, "2013": [4, 6], "2015": 6, "2019": 4, "207901": 16, "21": 18, "2103": 6, "2186": 16, "21888": 16, "22": 18, "224": [8, 9], "225": 9, "22672": 16, "229": [9, 16], "23": 18, "233": 16, "236": 6, "24": 18, "246": 16, "249": 16, "25": 18, "2504": 18, "255": [7, 8, 9, 10, 18], "256": 8, "257": 16, "26": 18, "26032": 16, "264": 12, "27": 18, "2700": 16, "2710": 18, "2749": 12, "28": 18, "287": 12, "29": 18, "296": 12, "299": 12, "2d": 18, "3": [3, 4, 7, 8, 9, 10, 17, 18], "30": 18, "300": 16, "3000": 16, "301": 12, "30595": 18, "30ghz": 18, "31": 8, "32": [6, 8, 9, 12, 16, 17, 18], "3232421875": 18, "33": [9, 18], "33402": 16, "33608": 16, "34": [8, 18], "340": 18, "3456": 18, "3515625": 18, "36": 18, "360": 16, "37": [6, 18], "38": 18, "39": 18, "4": [8, 9, 10, 18], "40": 18, "406": 9, "41": 18, "42": 18, "43": 18, "44": 18, "45": 18, "456": 9, "46": 18, "47": 18, "472": 16, "48": [6, 18], "485": 9, "49": 18, "49377": 16, "5": [6, 9, 10, 15, 18], "50": [8, 16, 18], "51": 18, "51171875": 18, "512": 8, "52": [6, 18], "529": 18, "53": 18, "54": 18, "540": 18, "5478515625": 18, "55": 18, "56": 18, "57": 18, "58": [6, 18], "580": 18, "5810546875": 18, "583": 18, "59": 18, "597": 18, "5k": [4, 6], "5m": 18, "6": [9, 18], "60": 9, "600": [8, 10, 18], "61": 18, "62": 18, "626": 16, "63": 18, "64": [8, 9, 18], "641": 18, "647": 16, "65": 18, "66": 18, "67": 18, "68": 18, "69": 18, "693": 12, "694": 12, "695": 12, "6m": 18, "7": 18, "70": [6, 10, 18], "707470": 16, "71": [6, 18], "7100000": 16, "7141797": 16, "7149": 16, "72": 18, "72dpi": 7, "73": 18, "73257": 16, "74": 18, "75": [9, 18], "7581382": 16, "76": 18, "77": 18, "772": 12, "772875": 16, "78": 18, "785": 12, "79": 18, "793533": 16, "796": 16, "798": 12, "7m": 18, "8": [8, 9, 18], "80": 18, "800": [8, 10, 16, 18], "81": 18, "82": 18, "83": 18, "84": 18, "849": 16, "85": 18, "8564453125": 18, "857": 18, "85875": 16, "86": 18, "8603515625": 18, "87": 18, "8707": 16, "88": 18, "89": 18, "9": [3, 9, 18], "90": 18, "90k": 6, "90kdict32px": 6, "91": 18, "914085328578949": 18, "92": 18, "93": 18, "94": [6, 18], "95": [10, 18], "9578408598899841": 18, "96": 18, "97": 18, "98": 18, "99": 18, "9949972033500671": 18, "A": [1, 2, 4, 6, 7, 8, 11, 17], "As": 2, "Be": 18, "Being": 1, "By": 13, "For": [1, 2, 3, 12, 18], "If": [2, 7, 8, 12, 18], "In": [2, 6, 16], "It": [9, 14, 15, 17], "Its": [4, 8], "No": [1, 18], "Of": 6, "Or": [15, 17], "The": [1, 2, 6, 7, 10, 13, 15, 16, 17, 18], "Then": 8, "To": [2, 3, 13, 14, 15, 17, 18], "_": [1, 6, 8], "__call__": 18, "_build": 2, "_i": 10, "ab": 6, "abc": 17, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "abdef": [6, 16], "abl": [16, 18], "about": [1, 16, 18], "abov": 18, "abstractdataset": 6, "abus": 1, "accept": 1, "access": [4, 7, 16, 18], "account": [1, 14], "accur": 18, "accuraci": 10, "achiev": 17, "act": 1, "action": 1, "activ": 4, "ad": [2, 8, 9], "adapt": 1, "add": [9, 10, 14, 18], "add_hook": 18, "add_label": 10, "addit": [2, 3, 7, 15, 18], "addition": [2, 18], "address": [1, 7], "adjust": 9, "advanc": 1, "advantag": 17, "advis": 2, "aesthet": [4, 6], "affect": 1, "after": [14, 18], "ag": 1, "again": 8, "aggreg": [10, 16], "aggress": 1, "align": [1, 7, 9], "all": [1, 2, 5, 6, 7, 9, 10, 15, 16, 18], "allow": [1, 17], "along": 18, "alreadi": [2, 17], "also": [1, 8, 14, 15, 16, 18], "alwai": 16, "an": [1, 2, 4, 6, 7, 8, 10, 15, 17, 18], "analysi": [7, 15], "ancient_greek": 6, "angl": [7, 9], "ani": [1, 6, 7, 8, 9, 10, 17, 18], "annot": 6, "anot": 16, "anoth": [8, 12, 16], "answer": 1, "anyascii": 10, "anyon": 4, "anyth": 15, "api": [2, 4], "apolog": 1, "apologi": 1, "app": 2, "appear": 1, "appli": [1, 6, 9], "applic": [4, 8], "appoint": 1, "appreci": 14, "appropri": [1, 2, 18], "ar": [1, 2, 3, 5, 6, 7, 9, 10, 11, 15, 16, 18], "arab": 6, "arabic_diacrit": 6, "arabic_lett": 6, "arabic_punctu": 6, "arbitrarili": [4, 8], "arch": [8, 14], "architectur": [4, 8, 14, 15], "area": 18, "argument": [6, 7, 8, 10, 12, 18], "around": 1, "arrai": [7, 9, 10], "art": [4, 15], "artefact": [10, 15, 18], "artefact_typ": 7, "artifici": [4, 6], "arxiv": [6, 8], "asarrai": 10, "ascii_lett": 6, "aspect": [4, 8, 9, 18], "assess": 10, "assign": 10, "associ": 7, "assum": 8, "assume_straight_pag": [8, 12, 18], "astyp": [8, 10, 18], "attack": 1, "attend": [4, 8], "attent": [1, 8], "autom": 4, "automat": 18, "autoregress": [4, 8], "avail": [1, 4, 5, 9], "averag": [9, 18], "avoid": [1, 3], "aw": [4, 18], "awar": 18, "azur": 18, "b": [8, 10, 18], "b_j": 10, "back": 2, "backbon": 8, "backend": 18, "background": 16, "bangla": 6, "bar": 15, "bar_cod": 16, "base": [4, 8, 15], "baselin": [4, 8, 18], "batch": [6, 8, 9, 15, 16, 18], "batch_siz": [6, 12, 15, 16, 17], "bblanchon": 3, "bbox": 18, "becaus": 13, "been": [2, 10, 16, 18], "befor": [6, 8, 9, 18], "begin": 10, "behavior": [1, 18], "being": [10, 18], "belong": 18, "benchmark": 18, "best": 1, "better": [11, 18], "between": [9, 10, 18], "bgr": 7, "bilinear": 9, "bin_thresh": 18, "binar": [4, 8, 18], "binari": [7, 17, 18], "bit": 17, "block": [10, 18], "block_1_1": 18, "blur": 9, "bmvc": 6, "bn": 14, "bodi": [1, 18], "bool": [6, 7, 8, 9, 10], "boolean": [8, 18], "both": [4, 6, 9, 16, 18], "bottom": [8, 18], "bound": [6, 7, 8, 9, 10, 15, 16, 18], "box": [6, 7, 8, 9, 10, 15, 16, 18], "box_thresh": 18, "bright": 9, "browser": [2, 4], "build": [2, 3, 17], "built": 2, "byte": [7, 18], "c": [3, 7, 10], "c_j": 10, "cach": [2, 6, 13], "cache_sampl": 6, "call": 17, "callabl": [6, 9], "can": [2, 3, 12, 13, 14, 15, 16, 18], "capabl": [2, 11, 18], "case": [6, 10], "cf": 18, "cfg": 18, "challeng": 6, "challenge2_test_task12_imag": 6, "challenge2_test_task1_gt": 6, "challenge2_training_task12_imag": 6, "challenge2_training_task1_gt": 6, "chang": [13, 18], "channel": [1, 2, 7, 9], "channel_prior": 3, "channelshuffl": 9, "charact": [4, 6, 7, 10, 16, 18], "charactergener": [6, 16], "characterist": 1, "charg": 18, "charset": 18, "chart": 7, "check": [2, 14, 18], "checkpoint": 8, "chip": 3, "ci": 2, "clarifi": 1, "clariti": 1, "class": [1, 6, 7, 9, 10, 18], "class_nam": 12, "classif": [16, 18], "classmethod": 7, "clear": 2, "clone": 3, "close": 2, "co": 14, "code": [4, 7, 15], "codecov": 2, "colab": 11, "collate_fn": 6, "collect": [7, 15], "color": 9, "colorinvers": 9, "column": 7, "com": [1, 3, 7, 8, 14], "combin": 18, "command": [2, 15], "comment": 1, "commit": 1, "common": [1, 9, 10, 17], "commun": 1, "compar": 4, "comparison": [10, 18], "competit": 6, "compil": [11, 18], "complaint": 1, "complementari": 10, "complet": 2, "compon": 18, "compos": [6, 18], "comprehens": 18, "comput": [6, 10, 17, 18], "conf_threshold": 15, "confid": [7, 18], "config": [3, 8], "configur": 8, "confus": 10, "consecut": [9, 18], "consequ": 1, "consid": [1, 2, 6, 7, 10, 18], "consist": 18, "consolid": [4, 6], "constant": 9, "construct": 1, "contact": 1, "contain": [5, 6, 11, 16, 18], "content": [6, 7, 18], "context": 8, "contib": 3, "continu": 1, "contrast": 9, "contrast_factor": 9, "contrib": [3, 15], "contribut": 1, "contributor": 2, "convers": 7, "convert": [7, 9], "convolut": 8, "coordin": [7, 18], "cord": [4, 6, 16, 18], "core": [10, 18], "corner": 18, "correct": 9, "correspond": [3, 7, 9, 18], "could": [1, 15], "counterpart": 10, "cover": 2, "coverag": 2, "cpu": [4, 12, 17], "creat": 14, "crnn": [4, 8, 14], "crnn_mobilenet_v3_larg": [8, 14, 18], "crnn_mobilenet_v3_smal": [8, 17, 18], "crnn_vgg16_bn": [8, 12, 14, 18], "crop": [7, 8, 9, 12, 16, 18], "crop_orient": [7, 18], "crop_orientation_predictor": [8, 12], "crop_param": 12, "cuda": 17, "currenc": 6, "current": [2, 12, 18], "custom": [14, 15, 17, 18], "custom_crop_orientation_model": 12, "custom_page_orientation_model": 12, "customhook": 18, "cvit": 4, "czczup": 8, "czech": 6, "d": [6, 16], "danish": 6, "data": [4, 6, 7, 9, 10, 12, 14], "dataload": 16, "dataset": [8, 12, 18], "dataset_info": 6, "date": [12, 18], "db": 14, "db_mobilenet_v3_larg": [8, 14, 18], "db_resnet34": 18, "db_resnet50": [8, 12, 14, 18], "dbnet": [4, 8], "deal": [11, 18], "decis": 1, "decod": 7, "decode_img_as_tensor": 7, "dedic": 17, "deem": 1, "deep": [8, 18], "def": 18, "default": [3, 7, 12, 13, 18], "defer": 16, "defin": [10, 17], "degre": [7, 9, 18], "degress": 7, "delet": 2, "delimit": 18, "delta": 9, "demo": [2, 4], "demonstr": 1, "depend": [2, 3, 4, 18], "deploi": 2, "deploy": 4, "derogatori": 1, "describ": 8, "descript": 11, "design": 9, "desir": 7, "det_arch": [8, 12, 14, 17], "det_b": 18, "det_model": [12, 14, 17], "det_param": 12, "det_predictor": [12, 18], "detail": [12, 18], "detect": [6, 7, 10, 11, 12, 15], "detect_languag": 8, "detect_orient": [8, 12, 18], "detection_predictor": [8, 18], "detection_task": [6, 16], "detectiondataset": [6, 16], "detectionmetr": 10, "detectionpredictor": [8, 12], "detector": [4, 8, 15], "deterior": 8, "determin": 1, "dev": [2, 13], "develop": 3, "deviat": 9, "devic": 17, "dict": [7, 10, 18], "dictionari": [7, 10], "differ": 1, "differenti": [4, 8], "digit": [4, 6, 16], "dimens": [7, 10, 18], "dimension": 9, "direct": 6, "directli": [14, 18], "directori": [2, 13], "disabl": [1, 13, 18], "disable_crop_orient": 18, "disable_page_orient": 18, "disclaim": 18, "discuss": 2, "disparag": 1, "displai": [7, 10], "display_artefact": 10, "distribut": 9, "div": 18, "divers": 1, "divid": 7, "do": [2, 3, 8], "doc": [2, 7, 15, 17, 18], "docartefact": [6, 16], "docstr": 2, "doctr": [3, 12, 13, 14, 15, 16, 17, 18], "doctr_cache_dir": 13, "doctr_multiprocessing_dis": 13, "document": [6, 8, 10, 11, 12, 15, 16, 17, 18], "documentbuild": 18, "documentfil": [7, 12, 14, 15, 17], "doesn": 17, "don": [12, 18], "done": 9, "download": [6, 16], "downsiz": 8, "draw": 9, "drop": 6, "drop_last": 6, "dtype": [7, 8, 9, 10, 17], "dual": [4, 6], "dummi": 14, "dummy_img": 18, "dummy_input": 17, "dure": 1, "dutch": 6, "dynam": [6, 15], "dynamic_seq_length": 6, "e": [1, 2, 3, 7, 8], "each": [4, 6, 7, 8, 9, 10, 16, 18], "eas": 2, "easi": [4, 10, 14, 17], "easili": [7, 10, 12, 14, 16, 18], "econom": 1, "edit": 1, "educ": 1, "effect": 18, "effici": [2, 4, 6, 8], "either": [10, 18], "element": [6, 7, 8, 18], "els": [2, 15], "email": 1, "empathi": 1, "en": 18, "enabl": [6, 7], "enclos": 7, "encod": [4, 6, 7, 8, 18], "encode_sequ": 6, "encount": 2, "encrypt": 7, "end": [4, 6, 8, 10], "english": [6, 16], "enough": [2, 18], "ensur": 2, "entri": 6, "environ": [1, 13], "eo": 6, "equiv": 18, "estim": 8, "etc": [7, 15], "ethnic": 1, "evalu": [16, 18], "event": 1, "everyon": 1, "everyth": [2, 18], "exact": [10, 18], "exampl": [1, 2, 4, 6, 8, 14, 18], "exchang": 17, "execut": 18, "exist": 14, "expand": 9, "expect": [7, 9, 10], "experi": 1, "explan": [1, 18], "explicit": 1, "exploit": [4, 8], "export": [7, 8, 10, 11, 15, 18], "export_as_straight_box": [8, 18], "export_as_xml": 18, "export_model_to_onnx": 17, "express": [1, 9], "extens": 7, "extern": [1, 16], "extract": [4, 6], "extractor": 8, "f_": 10, "f_a": 10, "factor": 9, "fair": 1, "fairli": 1, "fals": [6, 7, 8, 9, 10, 12, 18], "faq": 1, "fascan": 14, "fast": [4, 6, 8], "fast_bas": [8, 18], "fast_smal": [8, 18], "fast_tini": [8, 18], "faster": [4, 8, 17], "fasterrcnn_mobilenet_v3_large_fpn": 8, "favorit": 18, "featur": [3, 8, 10, 11, 12, 15], "feedback": 1, "feel": [2, 14], "felix92": 14, "few": [17, 18], "figsiz": 10, "figur": [10, 15], "file": [2, 6], "final": 8, "find": [2, 16], "finnish": 6, "first": [2, 6], "firsthand": 6, "fit": [8, 18], "flag": 18, "flip": 9, "float": [7, 9, 10, 17], "float32": [7, 8, 9, 17], "fn": 9, "focu": 14, "focus": [1, 6], "folder": 6, "follow": [1, 2, 3, 6, 9, 10, 12, 13, 14, 15, 18], "font": 6, "font_famili": 6, "foral": 10, "forc": 2, "forg": 3, "form": [4, 6, 18], "format": [7, 10, 12, 16, 17, 18], "forpost": [4, 6], "forum": 2, "fp16": 17, "frac": 10, "framework": [3, 14, 16, 18], "free": [1, 2, 14], "french": [6, 12, 14, 18], "friendli": 4, "from": [1, 4, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18], "from_hub": [8, 14], "from_imag": [7, 14, 15, 17], "from_pdf": 7, "from_url": 7, "full": [6, 10, 18], "function": [6, 9, 10, 15], "funsd": [4, 6, 16, 18], "further": 16, "futur": 6, "g": [7, 8], "g_": 10, "g_x": 10, "gamma": 9, "gaussian": 9, "gaussianblur": 9, "gaussiannois": 9, "gen": 18, "gender": 1, "gener": [2, 4, 7, 8], "generic_cyrillic_lett": 6, "geometri": [4, 7, 18], "geq": 10, "german": [6, 12, 14], "get": [17, 18], "git": 14, "github": [2, 3, 8, 14], "give": [1, 15], "given": [6, 7, 9, 10, 18], "global": 8, "go": 18, "good": 17, "googl": 2, "googlevis": 4, "gpu": [4, 15, 17], "gracefulli": 1, "graph": [4, 6, 7], "grayscal": 9, "ground": 10, "groung": 10, "group": [4, 18], "gt": 10, "gt_box": 10, "gt_label": 10, "guid": 2, "guidanc": 16, "gvision": 18, "h": [7, 8, 9], "h_": 10, "ha": [2, 6, 10, 16], "handl": [11, 16, 18], "handwrit": 6, "handwritten": 16, "harass": 1, "hardwar": 18, "harm": 1, "hat": 10, "have": [1, 2, 10, 12, 14, 16, 17, 18], "head": [8, 18], "healthi": 1, "hebrew": 6, "height": [7, 9], "hello": [10, 18], "help": 17, "here": [5, 9, 11, 15, 16, 18], "hf": 8, "hf_hub_download": 8, "high": 7, "higher": [3, 6, 18], "hindi": 6, "hindi_digit": 6, "hocr": 18, "hook": 18, "horizont": [7, 9, 18], "hous": 6, "how": [2, 11, 12, 14, 16], "howev": 16, "hsv": 9, "html": [1, 2, 3, 7, 18], "http": [1, 3, 6, 7, 8, 14, 18], "hub": 8, "hue": 9, "huggingfac": 8, "hw": 6, "i": [1, 2, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17], "i7": 18, "ic03": [4, 6, 16], "ic13": [4, 6, 16], "icdar": [4, 6], "icdar2019": 6, "id": 18, "ident": 1, "identifi": 4, "iiit": [4, 6], "iiit5k": [6, 16], "iiithw": [4, 6, 16], "imag": [4, 6, 7, 8, 9, 10, 14, 15, 16, 18], "imagenet": 8, "imageri": 1, "images_90k_norm": 6, "img": [6, 9, 16, 17], "img_cont": 7, "img_fold": [6, 16], "img_path": 7, "img_transform": 6, "imgur5k": [4, 6, 16], "imgur5k_annot": 6, "imlist": 6, "impact": 1, "implement": [6, 7, 8, 9, 10, 18], "import": [6, 7, 8, 9, 10, 12, 14, 15, 16, 17, 18], "improv": 8, "inappropri": 1, "incid": 1, "includ": [1, 6, 16, 17], "inclus": 1, "increas": 9, "independ": 9, "index": [2, 7], "indic": 10, "individu": 1, "infer": [4, 8, 9, 15, 18], "inform": [1, 2, 4, 6, 16], "input": [2, 7, 8, 9, 17, 18], "input_crop": 8, "input_pag": [8, 10, 18], "input_shap": 17, "input_tensor": 8, "inspir": [1, 9], "instal": [14, 15, 17], "instanc": [1, 18], "instanti": [8, 18], "instead": [6, 7, 8], "insult": 1, "int": [6, 7, 9], "int64": 10, "integ": 10, "integr": [4, 14, 16], "intel": 18, "interact": [1, 7, 10], "interfac": [14, 17], "interoper": 17, "interpol": 9, "interpret": [6, 7], "intersect": 10, "invert": 9, "investig": 1, "invis": 1, "involv": [1, 18], "io": [12, 14, 15, 17], "iou": 10, "iou_thresh": 10, "iou_threshold": 15, "irregular": [4, 8, 16], "isn": 6, "issu": [1, 2, 14], "italian": 6, "iter": [6, 9, 16, 18], "its": [7, 8, 9, 10, 16, 18], "itself": [8, 14], "j": 10, "job": 2, "join": 2, "jpeg": 9, "jpegqual": 9, "jpg": [6, 7, 14, 17], "json": [6, 16, 18], "json_output": 18, "jump": 2, "just": 1, "kei": [4, 6], "kera": [8, 17], "kernel": [4, 8, 9], "kernel_shap": 9, "keywoard": 8, "keyword": [6, 7, 8, 10], "kie": [8, 12], "kie_predictor": [8, 12], "kiepredictor": 8, "kind": 1, "know": [2, 17], "kwarg": [6, 7, 8, 10], "l": 10, "l_j": 10, "label": [6, 10, 15, 16], "label_fil": [6, 16], "label_fold": 6, "label_path": [6, 16], "labels_path": [6, 16], "ladder": 1, "lambda": 9, "lambdatransform": 9, "lang": 18, "languag": [1, 4, 6, 7, 8, 14, 18], "larg": [8, 14], "largest": 10, "last": [3, 6], "latenc": 8, "later": 2, "latest": 18, "latin": 6, "layer": 17, "layout": 18, "lead": 1, "leader": 1, "learn": [1, 4, 8, 17, 18], "least": 3, "left": [10, 18], "legacy_french": 6, "length": [6, 18], "less": [17, 18], "level": [1, 6, 10, 18], "leverag": 11, "lf": 14, "librari": [2, 3, 11, 12], "light": 4, "lightweight": 17, "like": 1, "limits_": 10, "line": [4, 8, 10, 18], "line_1_1": 18, "link": 12, "linknet": [4, 8], "linknet_resnet18": [8, 12, 17, 18], "linknet_resnet34": [8, 17, 18], "linknet_resnet50": [8, 18], "list": [6, 7, 9, 10, 14], "ll": 10, "load": [4, 6, 8, 15, 17], "load_state_dict": 12, "load_weight": 12, "loc_pr": 18, "local": [2, 4, 6, 8, 10, 16, 18], "localis": 6, "localizationconfus": 10, "locat": [2, 7, 18], "login": 8, "login_to_hub": [8, 14], "logo": [7, 15, 16], "love": 14, "lower": [9, 10, 18], "m": [2, 10, 18], "m1": 3, "macbook": 3, "machin": 17, "made": 4, "magc_resnet31": 8, "mai": [1, 2], "mail": 1, "main": 11, "maintain": 4, "mainten": 2, "make": [1, 2, 10, 12, 13, 14, 17, 18], "mani": [16, 18], "manipul": 18, "map": [6, 8], "map_loc": 12, "master": [4, 8, 18], "match": [10, 18], "mathcal": 10, "matplotlib": [7, 10], "max": [6, 9, 10], "max_angl": 9, "max_area": 9, "max_char": [6, 16], "max_delta": 9, "max_gain": 9, "max_gamma": 9, "max_qual": 9, "max_ratio": 9, "maximum": [6, 9], "maxval": [8, 9], "mbox": 10, "mean": [9, 10, 12], "meaniou": 10, "meant": [7, 17], "measur": 18, "media": 1, "median": 8, "meet": 12, "member": 1, "memori": [13, 17], "mention": 18, "merg": 6, "messag": 2, "meta": 18, "metadata": 17, "metal": 3, "method": [7, 9, 18], "metric": [10, 18], "middl": 18, "might": [17, 18], "min": 9, "min_area": 9, "min_char": [6, 16], "min_gain": 9, "min_gamma": 9, "min_qual": 9, "min_ratio": 9, "min_val": 9, "minde": [1, 3, 4, 8], "minim": [2, 4], "minimalist": [4, 8], "minimum": [3, 6, 9, 10, 18], "minval": 9, "miss": 3, "mistak": 1, "mixed_float16": 17, "mixed_precis": 17, "mjsynth": [4, 6, 16], "mnt": 6, "mobilenet": [8, 14], "mobilenet_v3_larg": 8, "mobilenet_v3_large_r": 8, "mobilenet_v3_smal": [8, 12], "mobilenet_v3_small_crop_orient": [8, 12], "mobilenet_v3_small_page_orient": [8, 12], "mobilenet_v3_small_r": 8, "mobilenetv3": 8, "modal": [4, 6], "mode": 3, "model": [6, 10, 13, 15, 16], "model_nam": [8, 14, 17], "model_path": [15, 17], "moder": 1, "modif": 2, "modifi": [8, 13, 18], "modul": [3, 7, 8, 9, 10, 18], "more": [2, 16, 18], "most": 18, "mozilla": 1, "multi": [4, 8], "multilingu": [6, 14], "multipl": [6, 7, 9, 18], "multipli": 9, "multiprocess": 13, "my": 8, "my_awesome_model": 14, "my_hook": 18, "n": [6, 10], "name": [6, 8, 17, 18], "nation": 1, "natur": [1, 4, 6], "ndarrai": [6, 7, 9, 10], "necessari": [3, 12, 13], "need": [2, 3, 6, 10, 12, 13, 14, 15, 18], "neg": 9, "nest": 18, "network": [4, 6, 8, 17], "neural": [4, 6, 8, 17], "new": [2, 10], "next": [6, 16], "nois": 9, "noisi": [4, 6], "non": [4, 6, 7, 8, 9, 10], "none": [6, 7, 8, 9, 10, 18], "normal": [8, 9], "norwegian": 6, "note": [0, 2, 6, 8, 12, 14, 15, 17], "now": 2, "np": [8, 9, 10, 18], "num_output_channel": 9, "num_sampl": [6, 16], "number": [6, 9, 10, 18], "numpi": [7, 8, 10, 18], "o": 3, "obb": 15, "obj_detect": 14, "object": [6, 7, 10, 15, 18], "objectness_scor": [7, 18], "oblig": 1, "obtain": 18, "occupi": 17, "ocr": [4, 6, 8, 10, 14], "ocr_carea": 18, "ocr_db_crnn": 10, "ocr_lin": 18, "ocr_pag": 18, "ocr_par": 18, "ocr_predictor": [8, 12, 14, 17, 18], "ocrdataset": [6, 16], "ocrmetr": 10, "ocrpredictor": [8, 12], "ocrx_word": 18, "offens": 1, "offici": [1, 8], "offlin": 1, "offset": 9, "onc": 18, "one": [2, 6, 8, 9, 12, 14, 18], "oneof": 9, "ones": [6, 10], "onli": [2, 8, 9, 10, 12, 14, 16, 17, 18], "onlin": 1, "onnx": 15, "onnxruntim": [15, 17], "onnxtr": 17, "opac": 9, "opacity_rang": 9, "open": [1, 2, 14, 17], "opinion": 1, "optic": [4, 18], "optim": [4, 18], "option": [6, 8, 12], "order": [2, 6, 7, 9], "org": [1, 6, 8, 18], "organ": 7, "orient": [1, 7, 8, 11, 15, 18], "orientationpredictor": 8, "other": [1, 2], "otherwis": [1, 7, 10], "our": [2, 8, 18], "out": [2, 8, 9, 10, 18], "outpout": 18, "output": [7, 9, 17], "output_s": [7, 9], "outsid": 13, "over": [6, 10, 18], "overal": [1, 8], "overlai": 7, "overview": 15, "overwrit": 12, "overwritten": 14, "own": 4, "p": [9, 18], "packag": [2, 4, 10, 13, 15, 16, 17], "pad": [6, 8, 9, 18], "page": [3, 6, 8, 10, 12, 18], "page1": 7, "page2": 7, "page_1": 18, "page_idx": [7, 18], "page_orientation_predictor": [8, 12], "page_param": 12, "pair": 10, "paper": 8, "par_1_1": 18, "paragraph": 18, "paragraph_break": 18, "param": [9, 18], "paramet": [4, 7, 8, 17], "pars": [4, 6], "parseq": [4, 8, 14, 17, 18], "part": [6, 9, 18], "parti": 3, "partial": 18, "particip": 1, "pass": [6, 7, 8, 12, 18], "password": 7, "patch": [8, 10], "path": [6, 7, 15, 16, 17], "path_to_checkpoint": 12, "path_to_custom_model": 17, "path_to_pt": 12, "pattern": 1, "pdf": [7, 8, 11], "pdfpage": 7, "peopl": 1, "per": [9, 18], "perform": [4, 7, 8, 9, 10, 13, 17, 18], "period": 1, "permiss": 1, "permut": [4, 8], "persian_lett": 6, "person": [1, 16], "phase": 18, "photo": 16, "physic": [1, 7], "pick": 9, "pictur": 7, "pip": [2, 3, 15, 17], "pipelin": 18, "pixel": [7, 9, 18], "pleas": 2, "plot": 10, "plt": 10, "plug": 14, "plugin": 3, "png": 7, "point": 17, "polici": 13, "polish": 6, "polit": 1, "polygon": [6, 10, 18], "pool": 8, "portugues": 6, "posit": [1, 10], "possibl": [2, 10, 14, 18], "post": [1, 18], "postprocessor": 18, "potenti": 8, "power": 4, "ppageno": 18, "pre": [2, 8, 17], "precis": [10, 18], "pred": 10, "pred_box": 10, "pred_label": 10, "predefin": 16, "predict": [7, 8, 10, 18], "predictor": [4, 7, 8, 11, 12, 14, 17], "prefer": 16, "preinstal": 3, "preprocessor": [12, 18], "prerequisit": 14, "present": 11, "preserv": [8, 9, 18], "preserve_aspect_ratio": [7, 8, 9, 12, 18], "pretrain": [4, 8, 10, 12, 17, 18], "pretrained_backbon": [8, 12], "print": 18, "prior": 6, "privaci": 1, "privat": 1, "probabl": 9, "problem": 2, "procedur": 9, "process": [2, 4, 7, 12, 18], "processor": 18, "produc": [11, 18], "product": 17, "profession": 1, "project": [2, 16], "promptli": 1, "proper": 2, "properli": 6, "provid": [1, 2, 4, 14, 15, 16, 18], "public": [1, 4], "publicli": 18, "publish": 1, "pull": 14, "punctuat": 6, "pure": 6, "purpos": 2, "push_to_hf_hub": [8, 14], "py": 14, "pypdfium2": [3, 7], "pyplot": [7, 10], "python": [2, 15], "python3": 14, "pytorch": [3, 4, 8, 9, 12, 14, 17, 18], "q": 2, "qr": [7, 15], "qr_code": 16, "qualiti": 9, "question": 1, "quickli": 4, "quicktour": 11, "r": 18, "race": 1, "ramdisk": 6, "rand": [8, 9, 10, 17, 18], "random": [8, 9, 10, 18], "randomappli": 9, "randombright": 9, "randomcontrast": 9, "randomcrop": 9, "randomgamma": 9, "randomhorizontalflip": 9, "randomhu": 9, "randomjpegqu": 9, "randomli": 9, "randomres": 9, "randomrot": 9, "randomsatur": 9, "randomshadow": 9, "rang": 9, "rassi": 14, "ratio": [8, 9, 18], "raw": [7, 10], "re": 17, "read": [4, 6, 8], "read_html": 7, "read_img_as_numpi": 7, "read_img_as_tensor": 7, "read_pdf": 7, "readi": 17, "real": [4, 8, 9], "reason": [1, 4, 6], "rebuild": 2, "rebuilt": 2, "recal": [10, 18], "receipt": [4, 6, 18], "reco_arch": [8, 12, 14, 17], "reco_b": 18, "reco_model": [12, 14, 17], "reco_param": 12, "reco_predictor": 12, "recogn": 18, "recognit": [6, 10, 11, 12], "recognition_predictor": [8, 18], "recognition_task": [6, 16], "recognitiondataset": [6, 16], "recognitionpredictor": [8, 12], "rectangular": 8, "reduc": [3, 9], "refer": [2, 3, 12, 14, 15, 16, 18], "regardless": 1, "region": 18, "regroup": 10, "regular": 16, "reject": 1, "rel": [7, 9, 10, 18], "relat": 7, "releas": [0, 3], "relev": 15, "religion": 1, "remov": 1, "render": [7, 18], "repo": 8, "repo_id": [8, 14], "report": 1, "repositori": [6, 8, 14], "repres": [1, 17, 18], "represent": [4, 8], "request": [1, 14], "requir": [3, 9, 17], "research": 4, "residu": 8, "resiz": [9, 18], "resnet": 8, "resnet18": [8, 14], "resnet31": 8, "resnet34": 8, "resnet50": [8, 14], "resolv": 7, "resolve_block": 18, "resolve_lin": 18, "resourc": 16, "respect": 1, "rest": [2, 9, 10], "restrict": 13, "result": [2, 6, 7, 11, 14, 17, 18], "return": 18, "reusabl": 18, "review": 1, "rgb": [7, 9], "rgb_mode": 7, "rgb_output": 7, "right": [1, 8, 10], "robust": [4, 6], "root": 6, "rotat": [6, 7, 8, 9, 10, 11, 12, 16, 18], "run": [2, 3, 8], "same": [2, 7, 10, 16, 17, 18], "sampl": [6, 16, 18], "sample_transform": 6, "sar": [4, 8], "sar_resnet31": [8, 18], "satur": 9, "save": [8, 16], "scale": [7, 8, 9, 10], "scale_rang": 9, "scan": [4, 6], "scene": [4, 6, 8], "score": [7, 10], "script": [2, 16], "seamless": 4, "seamlessli": [4, 18], "search": 8, "searchabl": 11, "sec": 18, "second": 18, "section": [12, 14, 15, 17, 18], "secur": [1, 13], "see": [1, 2], "seen": 18, "segment": [4, 8, 18], "self": 18, "semant": [4, 8], "send": 18, "sens": 10, "sensit": 16, "separ": 18, "sequenc": [4, 6, 7, 8, 10, 18], "sequenti": [9, 18], "seri": 1, "seriou": 1, "set": [1, 3, 6, 8, 10, 13, 15, 18], "set_global_polici": 17, "sever": [7, 9, 18], "sex": 1, "sexual": 1, "shade": 9, "shape": [4, 7, 8, 9, 10, 18], "share": [13, 16], "shift": 9, "shm": 13, "should": [2, 6, 7, 9, 10], "show": [4, 7, 8, 10, 12, 14, 15], "showcas": [2, 11], "shuffl": [6, 9], "side": 10, "signatur": 7, "signific": 16, "simpl": [4, 8, 17], "simpler": 8, "sinc": [6, 16], "singl": [1, 2, 4, 6], "single_img_doc": 17, "size": [1, 6, 7, 9, 15, 18], "skew": 18, "slack": 2, "slightli": 8, "small": [2, 8, 18], "smallest": 7, "snapshot_download": 8, "snippet": 18, "so": [2, 3, 6, 8, 14, 16], "social": 1, "socio": 1, "some": [3, 11, 14, 16], "someth": 2, "somewher": 2, "sort": 1, "sourc": [6, 7, 8, 9, 10, 14], "space": [1, 18], "span": 18, "spanish": 6, "spatial": [4, 6, 7], "specif": [2, 3, 10, 12, 16, 18], "specifi": [1, 6, 7], "speed": [4, 8, 18], "sphinx": 2, "sroie": [4, 6, 16], "stabl": 3, "stackoverflow": 2, "stage": 4, "standalon": 11, "standard": 9, "start": 6, "state": [4, 10, 15], "static": 10, "statu": 1, "std": [9, 12], "step": 13, "still": 18, "str": [6, 7, 8, 9, 10], "straight": [6, 8, 16, 18], "straighten": 18, "straighten_pag": [8, 12, 18], "straigten_pag": 12, "stream": 7, "street": [4, 6], "strict": 3, "strictli": 10, "string": [6, 7, 10, 18], "strive": 3, "strong": [4, 8], "structur": [17, 18], "subset": [6, 18], "suggest": [2, 14], "sum": 10, "summari": 10, "support": [3, 12, 15, 17, 18], "sustain": 1, "svhn": [4, 6, 16], "svt": [6, 16], "swedish": 6, "symmetr": [8, 9, 18], "symmetric_pad": [8, 9, 18], "synthet": 4, "synthtext": [4, 6, 16], "system": 18, "t": [2, 6, 12, 17, 18], "tabl": [14, 15, 16], "take": [1, 6, 18], "target": [6, 7, 9, 10, 16], "target_s": 6, "task": [4, 6, 8, 14, 16, 18], "task2": 6, "team": 3, "techminde": 3, "templat": [2, 4], "tensor": [6, 7, 9, 18], "tensorflow": [3, 4, 7, 8, 9, 12, 14, 17, 18], "tensorspec": 17, "term": 1, "test": [6, 16], "test_set": 6, "text": [6, 7, 8, 10, 16], "text_output": 18, "textmatch": 10, "textnet": 8, "textnet_bas": 8, "textnet_smal": 8, "textnet_tini": 8, "textract": [4, 18], "textstylebrush": [4, 6], "textual": [4, 6, 7, 8, 18], "tf": [3, 7, 8, 9, 14, 17], "than": [2, 10, 14], "thank": 2, "thei": [1, 10], "them": [6, 18], "thi": [1, 2, 3, 5, 6, 9, 10, 12, 13, 14, 16, 17, 18], "thing": [17, 18], "third": 3, "those": [1, 7, 18], "threaten": 1, "threshold": 18, "through": [1, 9, 15, 16], "tilman": 14, "time": [1, 4, 8, 10, 16], "tini": 8, "titl": [7, 18], "tm": 18, "tmp": 13, "togeth": [2, 7], "tograi": 9, "tool": 16, "top": [10, 17, 18], "topic": 2, "torch": [3, 9, 12, 14, 17], "torchvis": 9, "total": 12, "toward": [1, 3], "train": [2, 6, 8, 9, 14, 15, 16, 17, 18], "train_it": [6, 16], "train_load": [6, 16], "train_pytorch": 14, "train_set": [6, 16], "train_tensorflow": 14, "trainabl": [4, 8], "tranform": 9, "transcrib": 18, "transfer": [4, 6], "transfo": 9, "transform": [4, 6, 8], "translat": 1, "troll": 1, "true": [6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18], "truth": 10, "tune": 17, "tupl": [6, 7, 9, 10], "two": [7, 13], "txt": 6, "type": [7, 10, 14, 17, 18], "typic": 18, "u": [1, 2], "ucsd": 6, "udac": 2, "uint8": [7, 8, 10, 18], "ukrainian": 6, "unaccept": 1, "underli": [16, 18], "underneath": 7, "understand": [4, 6, 18], "uniform": [8, 9], "uniformli": 9, "uninterrupt": [7, 18], "union": 10, "unittest": 2, "unlock": 7, "unoffici": 8, "unprofession": 1, "unsolicit": 1, "unsupervis": 4, "unwelcom": 1, "up": [8, 18], "updat": 10, "upgrad": 2, "upper": [6, 9], "uppercas": 16, "url": 7, "us": [1, 2, 3, 6, 8, 10, 11, 12, 13, 14, 15, 18], "usabl": 18, "usag": [13, 17], "use_polygon": [6, 10, 16], "useabl": 18, "user": [4, 7, 11], "utf": 18, "util": 17, "v1": 14, "v3": [8, 14, 18], "valid": 16, "valu": [2, 7, 9, 18], "valuabl": 4, "variabl": 13, "varieti": 6, "veri": 8, "version": [1, 2, 3, 17, 18], "vgg": 8, "vgg16": 14, "vgg16_bn_r": 8, "via": 1, "vietnames": 6, "view": [4, 6], "viewpoint": 1, "violat": 1, "visibl": 1, "vision": [4, 6, 8], "visiondataset": 6, "visiontransform": 8, "visual": [3, 4, 15], "visualize_pag": 10, "vit_": 8, "vit_b": 8, "vitstr": [4, 8, 17], "vitstr_bas": [8, 18], "vitstr_smal": [8, 12, 17, 18], "viz": 3, "vocab": [12, 14, 16, 17, 18], "vocabulari": [6, 12, 14], "w": [7, 8, 9, 10], "w3": 18, "wa": 1, "wai": [1, 4, 16], "want": [2, 17, 18], "warmup": 18, "wasn": 2, "we": [1, 2, 3, 4, 7, 9, 12, 14, 16, 17, 18], "weasyprint": 7, "web": [2, 7], "websit": 6, "welcom": 1, "well": [1, 17], "were": [1, 7, 18], "what": 1, "when": [1, 2, 8], "whenev": 2, "where": [2, 7, 9, 10], "whether": [2, 6, 7, 9, 10, 16, 18], "which": [1, 8, 13, 15, 16, 18], "whichev": 3, "while": [9, 18], "why": 1, "width": [7, 9], "wiki": 1, "wildreceipt": [4, 6, 16], "window": [8, 10], "wish": 2, "within": 1, "without": [1, 6, 8], "wonder": 2, "word": [4, 6, 8, 10, 18], "word_1_1": 18, "word_1_2": 18, "word_1_3": 18, "wordgener": [6, 16], "words_onli": 10, "work": [12, 13, 18], "workflow": 2, "worklow": 2, "world": [10, 18], "worth": 8, "wrap": 18, "wrapper": [6, 9], "write": 13, "written": [1, 7], "www": [1, 7, 18], "x": [7, 9, 10], "x_ascend": 18, "x_descend": 18, "x_i": 10, "x_size": 18, "x_wconf": 18, "xhtml": 18, "xmax": 7, "xmin": 7, "xml": 18, "xml_bytes_str": 18, "xml_element": 18, "xml_output": 18, "xmln": 18, "y": 10, "y_i": 10, "y_j": 10, "yet": 15, "ymax": 7, "ymin": 7, "yolov8": 15, "you": [2, 3, 6, 7, 8, 12, 13, 14, 15, 16, 17, 18], "your": [2, 4, 7, 10, 18], "yoursit": 7, "zero": [9, 10], "zoo": 12, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 6, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 6, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 6, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 6, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 6, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 6, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 6, "\u00e4\u00f6\u00e4\u00f6": 6, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 6, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 6, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 6, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 6, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 6, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 6, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0905": 6, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 6, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 6, "\u0950": 6, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 6, "\u09bd": 6, "\u09ce": 6, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 6}, "titles": ["Changelog", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 2, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 1], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 1], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 1], "31": 0, "4": [0, 1], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 18, "approach": 18, "architectur": 18, "arg": [6, 7, 8, 9, 10], "artefact": 7, "artefactdetect": 15, "attribut": 1, "avail": [15, 16, 18], "aw": 13, "ban": 1, "block": 7, "bug": 2, "changelog": 0, "choos": [16, 18], "classif": [8, 12, 14], "code": [1, 2], "codebas": 2, "commit": 2, "commun": 14, "compos": 9, "conda": 3, "conduct": 1, "connect": 2, "continu": 2, "contrib": 5, "contribut": [2, 5, 15], "contributor": 1, "convent": 14, "correct": 1, "coven": 1, "custom": [6, 12], "data": 16, "dataload": 6, "dataset": [4, 6, 16], "detect": [4, 8, 14, 16, 18], "develop": 2, "do": 18, "doctr": [2, 4, 5, 6, 7, 8, 9, 10, 11], "document": [2, 4, 7], "end": 18, "enforc": 1, "evalu": 10, "export": 17, "factori": 8, "featur": [2, 4], "feedback": 2, "file": 7, "from": 14, "gener": [6, 16], "git": 3, "guidelin": 1, "half": 17, "hub": 14, "huggingfac": 14, "i": 18, "infer": 17, "instal": [2, 3], "integr": [2, 15], "io": 7, "lambda": 13, "let": 2, "line": 7, "linux": 3, "load": [12, 14, 16], "loader": 6, "main": 4, "mode": 2, "model": [4, 8, 12, 14, 17, 18], "modifi": 2, "modul": [5, 15], "name": 14, "notebook": 11, "object": 16, "ocr": [16, 18], "onli": 3, "onnx": 17, "optim": 17, "option": 18, "orient": 12, "our": 1, "output": 18, "own": [12, 16], "packag": 3, "page": 7, "perman": 1, "pipelin": 15, "pledg": 1, "precis": 17, "predictor": 18, "prepar": 17, "prerequisit": 3, "pretrain": 14, "push": 14, "python": 3, "qualiti": 2, "question": 2, "read": 7, "readi": 16, "recognit": [4, 8, 14, 16, 18], "report": 2, "request": 2, "respons": 1, "return": [6, 7, 8, 10], "right": 18, "scope": 1, "share": 14, "should": 18, "stage": 18, "standard": 1, "structur": [2, 7], "style": 2, "support": [4, 5, 6, 9], "synthet": [6, 16], "task": 10, "temporari": 1, "test": 2, "text": [4, 18], "train": 12, "transform": 9, "two": 18, "unit": 2, "us": [16, 17], "util": 10, "v0": 0, "verif": 2, "via": 3, "visual": 10, "vocab": 6, "warn": 1, "what": 18, "word": 7, "your": [12, 14, 15, 16, 17], "zoo": [4, 8]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[1, "correction"]], "2. Warning": [[1, "warning"]], "3. Temporary Ban": [[1, "temporary-ban"]], "4. Permanent Ban": [[1, "permanent-ban"]], "AWS Lambda": [[13, null]], "Advanced options": [[18, "advanced-options"]], "Args:": [[6, "args"], [6, "id4"], [6, "id7"], [6, "id10"], [6, "id13"], [6, "id16"], [6, "id19"], [6, "id22"], [6, "id25"], [6, "id29"], [6, "id32"], [6, "id37"], [6, "id40"], [6, "id46"], [6, "id49"], [6, "id50"], [6, "id51"], [6, "id54"], [6, "id57"], [6, "id60"], [6, "id61"], [7, "args"], [7, "id2"], [7, "id3"], [7, "id4"], [7, "id5"], [7, "id6"], [7, "id7"], [7, "id10"], [7, "id12"], [7, "id14"], [7, "id16"], [7, "id20"], [7, "id24"], [7, "id28"], [8, "args"], [8, "id3"], [8, "id8"], [8, "id13"], [8, "id17"], [8, "id21"], [8, "id26"], [8, "id31"], [8, "id36"], [8, "id41"], [8, "id46"], [8, "id50"], [8, "id54"], [8, "id59"], [8, "id63"], [8, "id68"], [8, "id73"], [8, "id77"], [8, "id81"], [8, "id85"], [8, "id90"], [8, "id95"], [8, "id99"], [8, "id104"], [8, "id109"], [8, "id114"], [8, "id119"], [8, "id123"], [8, "id127"], [8, "id132"], [8, "id137"], [8, "id142"], [8, "id146"], [8, "id150"], [8, "id155"], [8, "id159"], [8, "id163"], [8, "id167"], [8, "id169"], [8, "id171"], [8, "id173"], [9, "args"], [9, "id1"], [9, "id2"], [9, "id3"], [9, "id4"], [9, "id5"], [9, "id6"], [9, "id7"], [9, "id8"], [9, "id9"], [9, "id10"], [9, "id11"], [9, "id12"], [9, "id13"], [9, "id14"], [9, "id15"], [9, "id16"], [9, "id17"], [9, "id18"], [9, "id19"], [10, "args"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"]], "Artefact": [[7, "artefact"]], "ArtefactDetection": [[15, "artefactdetection"]], "Attribution": [[1, "attribution"]], "Available Datasets": [[16, "available-datasets"]], "Available architectures": [[18, "available-architectures"], [18, "id1"], [18, "id2"]], "Available contribution modules": [[15, "available-contribution-modules"]], "Block": [[7, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[16, null]], "Choosing the right model": [[18, null]], "Classification": [[14, "classification"]], "Code quality": [[2, "code-quality"]], "Code style verification": [[2, "code-style-verification"]], "Codebase structure": [[2, "codebase-structure"]], "Commits": [[2, "commits"]], "Composing transformations": [[9, "composing-transformations"]], "Continuous Integration": [[2, "continuous-integration"]], "Contributing to docTR": [[2, null]], "Contributor Covenant Code of Conduct": [[1, null]], "Custom dataset loader": [[6, "custom-dataset-loader"]], "Custom orientation classification models": [[12, "custom-orientation-classification-models"]], "Data Loading": [[16, "data-loading"]], "Dataloader": [[6, "dataloader"]], "Detection": [[14, "detection"], [16, "detection"]], "Detection predictors": [[18, "detection-predictors"]], "Developer mode installation": [[2, "developer-mode-installation"]], "Developing docTR": [[2, "developing-doctr"]], "Document": [[7, "document"]], "Document structure": [[7, "document-structure"]], "End-to-End OCR": [[18, "end-to-end-ocr"]], "Enforcement": [[1, "enforcement"]], "Enforcement Guidelines": [[1, "enforcement-guidelines"]], "Enforcement Responsibilities": [[1, "enforcement-responsibilities"]], "Export to ONNX": [[17, "export-to-onnx"]], "Feature requests & bug report": [[2, "feature-requests-bug-report"]], "Feedback": [[2, "feedback"]], "File reading": [[7, "file-reading"]], "Half-precision": [[17, "half-precision"]], "Installation": [[3, null]], "Integrate contributions into your pipeline": [[15, null]], "Let\u2019s connect": [[2, "let-s-connect"]], "Line": [[7, "line"]], "Loading from Huggingface Hub": [[14, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[12, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[12, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[4, "main-features"]], "Model optimization": [[17, "model-optimization"]], "Model zoo": [[4, "model-zoo"]], "Modifying the documentation": [[2, "modifying-the-documentation"]], "Naming conventions": [[14, "naming-conventions"]], "OCR": [[16, "ocr"]], "Object Detection": [[16, "object-detection"]], "Our Pledge": [[1, "our-pledge"]], "Our Standards": [[1, "our-standards"]], "Page": [[7, "page"]], "Preparing your model for inference": [[17, null]], "Prerequisites": [[3, "prerequisites"]], "Pretrained community models": [[14, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[14, "pushing-to-the-huggingface-hub"]], "Questions": [[2, "questions"]], "Recognition": [[14, "recognition"], [16, "recognition"]], "Recognition predictors": [[18, "recognition-predictors"]], "Returns:": [[6, "returns"], [7, "returns"], [7, "id11"], [7, "id13"], [7, "id15"], [7, "id19"], [7, "id23"], [7, "id27"], [7, "id31"], [8, "returns"], [8, "id6"], [8, "id11"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id29"], [8, "id34"], [8, "id39"], [8, "id44"], [8, "id49"], [8, "id53"], [8, "id57"], [8, "id62"], [8, "id66"], [8, "id71"], [8, "id76"], [8, "id80"], [8, "id84"], [8, "id88"], [8, "id93"], [8, "id98"], [8, "id102"], [8, "id107"], [8, "id112"], [8, "id117"], [8, "id122"], [8, "id126"], [8, "id130"], [8, "id135"], [8, "id140"], [8, "id145"], [8, "id149"], [8, "id153"], [8, "id158"], [8, "id162"], [8, "id166"], [8, "id168"], [8, "id170"], [8, "id172"], [10, "returns"]], "Scope": [[1, "scope"]], "Share your model with the community": [[14, null]], "Supported Vocabs": [[6, "supported-vocabs"]], "Supported contribution modules": [[5, "supported-contribution-modules"]], "Supported datasets": [[4, "supported-datasets"]], "Supported transformations": [[9, "supported-transformations"]], "Synthetic dataset generator": [[6, "synthetic-dataset-generator"], [16, "synthetic-dataset-generator"]], "Task evaluation": [[10, "task-evaluation"]], "Text Detection": [[18, "text-detection"]], "Text Recognition": [[18, "text-recognition"]], "Text detection models": [[4, "text-detection-models"]], "Text recognition models": [[4, "text-recognition-models"]], "Train your own model": [[12, null]], "Two-stage approaches": [[18, "two-stage-approaches"]], "Unit tests": [[2, "unit-tests"]], "Use your own datasets": [[16, "use-your-own-datasets"]], "Using your ONNX exported model": [[17, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[3, "via-conda-only-for-linux"]], "Via Git": [[3, "via-git"]], "Via Python Package": [[3, "via-python-package"]], "Visualization": [[10, "visualization"]], "What should I do with the output?": [[18, "what-should-i-do-with-the-output"]], "Word": [[7, "word"]], "docTR Notebooks": [[11, null]], "docTR Vocabs": [[6, "id62"]], "docTR: Document Text Recognition": [[4, null]], "doctr.contrib": [[5, null]], "doctr.datasets": [[6, null], [6, "datasets"]], "doctr.io": [[7, null]], "doctr.models": [[8, null]], "doctr.models.classification": [[8, "doctr-models-classification"]], "doctr.models.detection": [[8, "doctr-models-detection"]], "doctr.models.factory": [[8, "doctr-models-factory"]], "doctr.models.recognition": [[8, "doctr-models-recognition"]], "doctr.models.zoo": [[8, "doctr-models-zoo"]], "doctr.transforms": [[9, null]], "doctr.utils": [[10, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[7, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[7, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[9, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[6, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[9, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[9, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[6, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[6, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[8, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[6, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[6, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[7, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[7, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[6, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[6, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[9, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[9, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[6, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[6, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[6, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[6, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[6, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[8, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[9, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[7, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[6, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[9, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[8, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[6, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[9, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[7, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[9, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[9, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[9, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[9, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[9, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[9, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[9, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[9, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[9, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[9, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[9, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[9, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[7, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[7, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[7, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[6, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[9, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[7, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[7, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[6, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[6, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[6, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[6, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[9, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[10, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[6, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[7, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[6, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[6, 0, 1, "", "CORD"], [6, 0, 1, "", "CharacterGenerator"], [6, 0, 1, "", "DetectionDataset"], [6, 0, 1, "", "DocArtefacts"], [6, 0, 1, "", "FUNSD"], [6, 0, 1, "", "IC03"], [6, 0, 1, "", "IC13"], [6, 0, 1, "", "IIIT5K"], [6, 0, 1, "", "IIITHWS"], [6, 0, 1, "", "IMGUR5K"], [6, 0, 1, "", "MJSynth"], [6, 0, 1, "", "OCRDataset"], [6, 0, 1, "", "RecognitionDataset"], [6, 0, 1, "", "SROIE"], [6, 0, 1, "", "SVHN"], [6, 0, 1, "", "SVT"], [6, 0, 1, "", "SynthText"], [6, 0, 1, "", "WILDRECEIPT"], [6, 0, 1, "", "WordGenerator"], [6, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[6, 0, 1, "", "DataLoader"]], "doctr.io": [[7, 0, 1, "", "Artefact"], [7, 0, 1, "", "Block"], [7, 0, 1, "", "Document"], [7, 0, 1, "", "DocumentFile"], [7, 0, 1, "", "Line"], [7, 0, 1, "", "Page"], [7, 0, 1, "", "Word"], [7, 1, 1, "", "decode_img_as_tensor"], [7, 1, 1, "", "read_html"], [7, 1, 1, "", "read_img_as_numpy"], [7, 1, 1, "", "read_img_as_tensor"], [7, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[7, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[7, 2, 1, "", "from_images"], [7, 2, 1, "", "from_pdf"], [7, 2, 1, "", "from_url"]], "doctr.io.Page": [[7, 2, 1, "", "show"]], "doctr.models": [[8, 1, 1, "", "kie_predictor"], [8, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[8, 1, 1, "", "crop_orientation_predictor"], [8, 1, 1, "", "magc_resnet31"], [8, 1, 1, "", "mobilenet_v3_large"], [8, 1, 1, "", "mobilenet_v3_large_r"], [8, 1, 1, "", "mobilenet_v3_small"], [8, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [8, 1, 1, "", "mobilenet_v3_small_page_orientation"], [8, 1, 1, "", "mobilenet_v3_small_r"], [8, 1, 1, "", "page_orientation_predictor"], [8, 1, 1, "", "resnet18"], [8, 1, 1, "", "resnet31"], [8, 1, 1, "", "resnet34"], [8, 1, 1, "", "resnet50"], [8, 1, 1, "", "textnet_base"], [8, 1, 1, "", "textnet_small"], [8, 1, 1, "", "textnet_tiny"], [8, 1, 1, "", "vgg16_bn_r"], [8, 1, 1, "", "vit_b"], [8, 1, 1, "", "vit_s"]], "doctr.models.detection": [[8, 1, 1, "", "db_mobilenet_v3_large"], [8, 1, 1, "", "db_resnet50"], [8, 1, 1, "", "detection_predictor"], [8, 1, 1, "", "fast_base"], [8, 1, 1, "", "fast_small"], [8, 1, 1, "", "fast_tiny"], [8, 1, 1, "", "linknet_resnet18"], [8, 1, 1, "", "linknet_resnet34"], [8, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[8, 1, 1, "", "from_hub"], [8, 1, 1, "", "login_to_hub"], [8, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[8, 1, 1, "", "crnn_mobilenet_v3_large"], [8, 1, 1, "", "crnn_mobilenet_v3_small"], [8, 1, 1, "", "crnn_vgg16_bn"], [8, 1, 1, "", "master"], [8, 1, 1, "", "parseq"], [8, 1, 1, "", "recognition_predictor"], [8, 1, 1, "", "sar_resnet31"], [8, 1, 1, "", "vitstr_base"], [8, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[9, 0, 1, "", "ChannelShuffle"], [9, 0, 1, "", "ColorInversion"], [9, 0, 1, "", "Compose"], [9, 0, 1, "", "GaussianBlur"], [9, 0, 1, "", "GaussianNoise"], [9, 0, 1, "", "LambdaTransformation"], [9, 0, 1, "", "Normalize"], [9, 0, 1, "", "OneOf"], [9, 0, 1, "", "RandomApply"], [9, 0, 1, "", "RandomBrightness"], [9, 0, 1, "", "RandomContrast"], [9, 0, 1, "", "RandomCrop"], [9, 0, 1, "", "RandomGamma"], [9, 0, 1, "", "RandomHorizontalFlip"], [9, 0, 1, "", "RandomHue"], [9, 0, 1, "", "RandomJpegQuality"], [9, 0, 1, "", "RandomResize"], [9, 0, 1, "", "RandomRotate"], [9, 0, 1, "", "RandomSaturation"], [9, 0, 1, "", "RandomShadow"], [9, 0, 1, "", "Resize"], [9, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[10, 0, 1, "", "DetectionMetric"], [10, 0, 1, "", "LocalizationConfusion"], [10, 0, 1, "", "OCRMetric"], [10, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.visualization": [[10, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [1, 7, 8, 10, 14, 17], "0": [1, 3, 6, 9, 10, 12, 15, 16, 18], "00": 18, "01": 18, "0123456789": 6, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "02562": 8, "03": 18, "035": 18, "0361328125": 18, "04": 18, "05": 18, "06": 18, "06640625": 18, "07": 18, "08": [9, 18], "09": 18, "0966796875": 18, "1": [6, 7, 8, 9, 10, 12, 16, 18], "10": [6, 10, 18], "100": [6, 9, 10, 16, 18], "1000": 18, "101": 6, "1024": [8, 12, 18], "104": 6, "106": 6, "108": 6, "1095": 16, "11": 18, "110": 10, "1107": 16, "114": 6, "115": 6, "1156": 16, "116": 6, "118": 6, "11800h": 18, "11th": 18, "12": 18, "120": 6, "123": 6, "126": 6, "1268": 16, "128": [8, 12, 17, 18], "13": 18, "130": 6, "13068": 16, "131": 6, "1337891": 16, "1357421875": 18, "1396484375": 18, "14": 18, "1420": 18, "14470v1": 6, "149": 16, "15": 18, "150": [10, 18], "1552": 18, "16": [8, 17, 18], "1630859375": 18, "1684": 18, "16x16": 8, "17": 18, "1778": 18, "1782": 18, "18": [8, 18], "185546875": 18, "1900": 18, "1910": 8, "19342": 16, "19370": 16, "195": 6, "19598": 16, "199": 18, "1999": 18, "2": [3, 4, 6, 7, 9, 15, 18], "20": 18, "200": 10, "2000": 16, "2003": [4, 6], "2012": 6, "2013": [4, 6], "2015": 6, "2019": 4, "207901": 16, "21": 18, "2103": 6, "2186": 16, "21888": 16, "22": 18, "224": [8, 9], "225": 9, "22672": 16, "229": [9, 16], "23": 18, "233": 16, "236": 6, "24": 18, "246": 16, "249": 16, "25": 18, "2504": 18, "255": [7, 8, 9, 10, 18], "256": 8, "257": 16, "26": 18, "26032": 16, "264": 12, "27": 18, "2700": 16, "2710": 18, "2749": 12, "28": 18, "287": 12, "29": 18, "296": 12, "299": 12, "2d": 18, "3": [3, 4, 7, 8, 9, 10, 17, 18], "30": 18, "300": 16, "3000": 16, "301": 12, "30595": 18, "30ghz": 18, "31": 8, "32": [6, 8, 9, 12, 16, 17, 18], "3232421875": 18, "33": [9, 18], "33402": 16, "33608": 16, "34": [8, 18], "340": 18, "3456": 18, "3515625": 18, "36": 18, "360": 16, "37": [6, 18], "38": 18, "39": 18, "4": [8, 9, 10, 18], "40": 18, "406": 9, "41": 18, "42": 18, "43": 18, "44": 18, "45": 18, "456": 9, "46": 18, "47": 18, "472": 16, "48": [6, 18], "485": 9, "49": 18, "49377": 16, "5": [6, 9, 10, 15, 18], "50": [8, 16, 18], "51": 18, "51171875": 18, "512": 8, "52": [6, 18], "529": 18, "53": 18, "54": 18, "540": 18, "5478515625": 18, "55": 18, "56": 18, "57": 18, "58": [6, 18], "580": 18, "5810546875": 18, "583": 18, "59": 18, "597": 18, "5k": [4, 6], "5m": 18, "6": [9, 18], "60": 9, "600": [8, 10, 18], "61": 18, "62": 18, "626": 16, "63": 18, "64": [8, 9, 18], "641": 18, "647": 16, "65": 18, "66": 18, "67": 18, "68": 18, "69": 18, "693": 12, "694": 12, "695": 12, "6m": 18, "7": 18, "70": [6, 10, 18], "707470": 16, "71": [6, 18], "7100000": 16, "7141797": 16, "7149": 16, "72": 18, "72dpi": 7, "73": 18, "73257": 16, "74": 18, "75": [9, 18], "7581382": 16, "76": 18, "77": 18, "772": 12, "772875": 16, "78": 18, "785": 12, "79": 18, "793533": 16, "796": 16, "798": 12, "7m": 18, "8": [8, 9, 18], "80": 18, "800": [8, 10, 16, 18], "81": 18, "82": 18, "83": 18, "84": 18, "849": 16, "85": 18, "8564453125": 18, "857": 18, "85875": 16, "86": 18, "8603515625": 18, "87": 18, "8707": 16, "88": 18, "89": 18, "9": [3, 9, 18], "90": 18, "90k": 6, "90kdict32px": 6, "91": 18, "914085328578949": 18, "92": 18, "93": 18, "94": [6, 18], "95": [10, 18], "9578408598899841": 18, "96": 18, "97": 18, "98": 18, "99": 18, "9949972033500671": 18, "A": [1, 2, 4, 6, 7, 8, 11, 17], "As": 2, "Be": 18, "Being": 1, "By": 13, "For": [1, 2, 3, 12, 18], "If": [2, 7, 8, 12, 18], "In": [2, 6, 16], "It": [9, 14, 15, 17], "Its": [4, 8], "No": [1, 18], "Of": 6, "Or": [15, 17], "The": [1, 2, 6, 7, 10, 13, 15, 16, 17, 18], "Then": 8, "To": [2, 3, 13, 14, 15, 17, 18], "_": [1, 6, 8], "__call__": 18, "_build": 2, "_i": 10, "ab": 6, "abc": 17, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "abdef": [6, 16], "abl": [16, 18], "about": [1, 16, 18], "abov": 18, "abstractdataset": 6, "abus": 1, "accept": 1, "access": [4, 7, 16, 18], "account": [1, 14], "accur": 18, "accuraci": 10, "achiev": 17, "act": 1, "action": 1, "activ": 4, "ad": [2, 8, 9], "adapt": 1, "add": [9, 10, 14, 18], "add_hook": 18, "add_label": 10, "addit": [2, 3, 7, 15, 18], "addition": [2, 18], "address": [1, 7], "adjust": 9, "advanc": 1, "advantag": 17, "advis": 2, "aesthet": [4, 6], "affect": 1, "after": [14, 18], "ag": 1, "again": 8, "aggreg": [10, 16], "aggress": 1, "align": [1, 7, 9], "all": [1, 2, 5, 6, 7, 9, 10, 15, 16, 18], "allow": [1, 17], "along": 18, "alreadi": [2, 17], "also": [1, 8, 14, 15, 16, 18], "alwai": 16, "an": [1, 2, 4, 6, 7, 8, 10, 15, 17, 18], "analysi": [7, 15], "ancient_greek": 6, "angl": [7, 9], "ani": [1, 6, 7, 8, 9, 10, 17, 18], "annot": 6, "anot": 16, "anoth": [8, 12, 16], "answer": 1, "anyascii": 10, "anyon": 4, "anyth": 15, "api": [2, 4], "apolog": 1, "apologi": 1, "app": 2, "appear": 1, "appli": [1, 6, 9], "applic": [4, 8], "appoint": 1, "appreci": 14, "appropri": [1, 2, 18], "ar": [1, 2, 3, 5, 6, 7, 9, 10, 11, 15, 16, 18], "arab": 6, "arabic_diacrit": 6, "arabic_lett": 6, "arabic_punctu": 6, "arbitrarili": [4, 8], "arch": [8, 14], "architectur": [4, 8, 14, 15], "area": 18, "argument": [6, 7, 8, 10, 12, 18], "around": 1, "arrai": [7, 9, 10], "art": [4, 15], "artefact": [10, 15, 18], "artefact_typ": 7, "artifici": [4, 6], "arxiv": [6, 8], "asarrai": 10, "ascii_lett": 6, "aspect": [4, 8, 9, 18], "assess": 10, "assign": 10, "associ": 7, "assum": 8, "assume_straight_pag": [8, 12, 18], "astyp": [8, 10, 18], "attack": 1, "attend": [4, 8], "attent": [1, 8], "autom": 4, "automat": 18, "autoregress": [4, 8], "avail": [1, 4, 5, 9], "averag": [9, 18], "avoid": [1, 3], "aw": [4, 18], "awar": 18, "azur": 18, "b": [8, 10, 18], "b_j": 10, "back": 2, "backbon": 8, "backend": 18, "background": 16, "bangla": 6, "bar": 15, "bar_cod": 16, "base": [4, 8, 15], "baselin": [4, 8, 18], "batch": [6, 8, 9, 15, 16, 18], "batch_siz": [6, 12, 15, 16, 17], "bblanchon": 3, "bbox": 18, "becaus": 13, "been": [2, 10, 16, 18], "befor": [6, 8, 9, 18], "begin": 10, "behavior": [1, 18], "being": [10, 18], "belong": 18, "benchmark": 18, "best": 1, "better": [11, 18], "between": [9, 10, 18], "bgr": 7, "bilinear": 9, "bin_thresh": 18, "binar": [4, 8, 18], "binari": [7, 17, 18], "bit": 17, "block": [10, 18], "block_1_1": 18, "blur": 9, "bmvc": 6, "bn": 14, "bodi": [1, 18], "bool": [6, 7, 8, 9, 10], "boolean": [8, 18], "both": [4, 6, 9, 16, 18], "bottom": [8, 18], "bound": [6, 7, 8, 9, 10, 15, 16, 18], "box": [6, 7, 8, 9, 10, 15, 16, 18], "box_thresh": 18, "bright": 9, "browser": [2, 4], "build": [2, 3, 17], "built": 2, "byte": [7, 18], "c": [3, 7, 10], "c_j": 10, "cach": [2, 6, 13], "cache_sampl": 6, "call": 17, "callabl": [6, 9], "can": [2, 3, 12, 13, 14, 15, 16, 18], "capabl": [2, 11, 18], "case": [6, 10], "cf": 18, "cfg": 18, "challeng": 6, "challenge2_test_task12_imag": 6, "challenge2_test_task1_gt": 6, "challenge2_training_task12_imag": 6, "challenge2_training_task1_gt": 6, "chang": [13, 18], "channel": [1, 2, 7, 9], "channel_prior": 3, "channelshuffl": 9, "charact": [4, 6, 7, 10, 16, 18], "charactergener": [6, 16], "characterist": 1, "charg": 18, "charset": 18, "chart": 7, "check": [2, 14, 18], "checkpoint": 8, "chip": 3, "ci": 2, "clarifi": 1, "clariti": 1, "class": [1, 6, 7, 9, 10, 18], "class_nam": 12, "classif": [16, 18], "classmethod": 7, "clear": 2, "clone": 3, "close": 2, "co": 14, "code": [4, 7, 15], "codecov": 2, "colab": 11, "collate_fn": 6, "collect": [7, 15], "color": 9, "colorinvers": 9, "column": 7, "com": [1, 3, 7, 8, 14], "combin": 18, "command": [2, 15], "comment": 1, "commit": 1, "common": [1, 9, 10, 17], "commun": 1, "compar": 4, "comparison": [10, 18], "competit": 6, "compil": [11, 18], "complaint": 1, "complementari": 10, "complet": 2, "compon": 18, "compos": [6, 18], "comprehens": 18, "comput": [6, 10, 17, 18], "conf_threshold": 15, "confid": [7, 18], "config": [3, 8], "configur": 8, "confus": 10, "consecut": [9, 18], "consequ": 1, "consid": [1, 2, 6, 7, 10, 18], "consist": 18, "consolid": [4, 6], "constant": 9, "construct": 1, "contact": 1, "contain": [5, 6, 11, 16, 18], "content": [6, 7, 18], "context": 8, "contib": 3, "continu": 1, "contrast": 9, "contrast_factor": 9, "contrib": [3, 15], "contribut": 1, "contributor": 2, "convers": 7, "convert": [7, 9], "convolut": 8, "coordin": [7, 18], "cord": [4, 6, 16, 18], "core": [10, 18], "corner": 18, "correct": 9, "correspond": [3, 7, 9, 18], "could": [1, 15], "counterpart": 10, "cover": 2, "coverag": 2, "cpu": [4, 12, 17], "creat": 14, "crnn": [4, 8, 14], "crnn_mobilenet_v3_larg": [8, 14, 18], "crnn_mobilenet_v3_smal": [8, 17, 18], "crnn_vgg16_bn": [8, 12, 14, 18], "crop": [7, 8, 9, 12, 16, 18], "crop_orient": [7, 18], "crop_orientation_predictor": [8, 12], "crop_param": 12, "cuda": 17, "currenc": 6, "current": [2, 12, 18], "custom": [14, 15, 17, 18], "custom_crop_orientation_model": 12, "custom_page_orientation_model": 12, "customhook": 18, "cvit": 4, "czczup": 8, "czech": 6, "d": [6, 16], "danish": 6, "data": [4, 6, 7, 9, 10, 12, 14], "dataload": 16, "dataset": [8, 12, 18], "dataset_info": 6, "date": [12, 18], "db": 14, "db_mobilenet_v3_larg": [8, 14, 18], "db_resnet34": 18, "db_resnet50": [8, 12, 14, 18], "dbnet": [4, 8], "deal": [11, 18], "decis": 1, "decod": 7, "decode_img_as_tensor": 7, "dedic": 17, "deem": 1, "deep": [8, 18], "def": 18, "default": [3, 7, 12, 13, 18], "defer": 16, "defin": [10, 17], "degre": [7, 9, 18], "degress": 7, "delet": 2, "delimit": 18, "delta": 9, "demo": [2, 4], "demonstr": 1, "depend": [2, 3, 4, 18], "deploi": 2, "deploy": 4, "derogatori": 1, "describ": 8, "descript": 11, "design": 9, "desir": 7, "det_arch": [8, 12, 14, 17], "det_b": 18, "det_model": [12, 14, 17], "det_param": 12, "det_predictor": [12, 18], "detail": [12, 18], "detect": [6, 7, 10, 11, 12, 15], "detect_languag": 8, "detect_orient": [8, 12, 18], "detection_predictor": [8, 18], "detection_task": [6, 16], "detectiondataset": [6, 16], "detectionmetr": 10, "detectionpredictor": [8, 12], "detector": [4, 8, 15], "deterior": 8, "determin": 1, "dev": [2, 13], "develop": 3, "deviat": 9, "devic": 17, "dict": [7, 10, 18], "dictionari": [7, 10], "differ": 1, "differenti": [4, 8], "digit": [4, 6, 16], "dimens": [7, 10, 18], "dimension": 9, "direct": 6, "directli": [14, 18], "directori": [2, 13], "disabl": [1, 13, 18], "disable_crop_orient": 18, "disable_page_orient": 18, "disclaim": 18, "discuss": 2, "disparag": 1, "displai": [7, 10], "display_artefact": 10, "distribut": 9, "div": 18, "divers": 1, "divid": 7, "do": [2, 3, 8], "doc": [2, 7, 15, 17, 18], "docartefact": [6, 16], "docstr": 2, "doctr": [3, 12, 13, 14, 15, 16, 17, 18], "doctr_cache_dir": 13, "doctr_multiprocessing_dis": 13, "document": [6, 8, 10, 11, 12, 15, 16, 17, 18], "documentbuild": 18, "documentfil": [7, 12, 14, 15, 17], "doesn": 17, "don": [12, 18], "done": 9, "download": [6, 16], "downsiz": 8, "draw": 9, "drop": 6, "drop_last": 6, "dtype": [7, 8, 9, 10, 17], "dual": [4, 6], "dummi": 14, "dummy_img": 18, "dummy_input": 17, "dure": 1, "dutch": 6, "dynam": [6, 15], "dynamic_seq_length": 6, "e": [1, 2, 3, 7, 8], "each": [4, 6, 7, 8, 9, 10, 16, 18], "eas": 2, "easi": [4, 10, 14, 17], "easili": [7, 10, 12, 14, 16, 18], "econom": 1, "edit": 1, "educ": 1, "effect": 18, "effici": [2, 4, 6, 8], "either": [10, 18], "element": [6, 7, 8, 18], "els": [2, 15], "email": 1, "empathi": 1, "en": 18, "enabl": [6, 7], "enclos": 7, "encod": [4, 6, 7, 8, 18], "encode_sequ": 6, "encount": 2, "encrypt": 7, "end": [4, 6, 8, 10], "english": [6, 16], "enough": [2, 18], "ensur": 2, "entri": 6, "environ": [1, 13], "eo": 6, "equiv": 18, "estim": 8, "etc": [7, 15], "ethnic": 1, "evalu": [16, 18], "event": 1, "everyon": 1, "everyth": [2, 18], "exact": [10, 18], "exampl": [1, 2, 4, 6, 8, 14, 18], "exchang": 17, "execut": 18, "exist": 14, "expand": 9, "expect": [7, 9, 10], "experi": 1, "explan": [1, 18], "explicit": 1, "exploit": [4, 8], "export": [7, 8, 10, 11, 15, 18], "export_as_straight_box": [8, 18], "export_as_xml": 18, "export_model_to_onnx": 17, "express": [1, 9], "extens": 7, "extern": [1, 16], "extract": [4, 6], "extractor": 8, "f_": 10, "f_a": 10, "factor": 9, "fair": 1, "fairli": 1, "fals": [6, 7, 8, 9, 10, 12, 18], "faq": 1, "fascan": 14, "fast": [4, 6, 8], "fast_bas": [8, 18], "fast_smal": [8, 18], "fast_tini": [8, 18], "faster": [4, 8, 17], "fasterrcnn_mobilenet_v3_large_fpn": 8, "favorit": 18, "featur": [3, 8, 10, 11, 12, 15], "feedback": 1, "feel": [2, 14], "felix92": 14, "few": [17, 18], "figsiz": 10, "figur": [10, 15], "file": [2, 6], "final": 8, "find": [2, 16], "finnish": 6, "first": [2, 6], "firsthand": 6, "fit": [8, 18], "flag": 18, "flip": 9, "float": [7, 9, 10, 17], "float32": [7, 8, 9, 17], "fn": 9, "focu": 14, "focus": [1, 6], "folder": 6, "follow": [1, 2, 3, 6, 9, 10, 12, 13, 14, 15, 18], "font": 6, "font_famili": 6, "foral": 10, "forc": 2, "forg": 3, "form": [4, 6, 18], "format": [7, 10, 12, 16, 17, 18], "forpost": [4, 6], "forum": 2, "fp16": 17, "frac": 10, "framework": [3, 14, 16, 18], "free": [1, 2, 14], "french": [6, 12, 14, 18], "friendli": 4, "from": [1, 4, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18], "from_hub": [8, 14], "from_imag": [7, 14, 15, 17], "from_pdf": 7, "from_url": 7, "full": [6, 10, 18], "function": [6, 9, 10, 15], "funsd": [4, 6, 16, 18], "further": 16, "futur": 6, "g": [7, 8], "g_": 10, "g_x": 10, "gamma": 9, "gaussian": 9, "gaussianblur": 9, "gaussiannois": 9, "gen": 18, "gender": 1, "gener": [2, 4, 7, 8], "generic_cyrillic_lett": 6, "geometri": [4, 7, 18], "geq": 10, "german": [6, 12, 14], "get": [17, 18], "git": 14, "github": [2, 3, 8, 14], "give": [1, 15], "given": [6, 7, 9, 10, 18], "global": 8, "go": 18, "good": 17, "googl": 2, "googlevis": 4, "gpu": [4, 15, 17], "gracefulli": 1, "graph": [4, 6, 7], "grayscal": 9, "ground": 10, "groung": 10, "group": [4, 18], "gt": 10, "gt_box": 10, "gt_label": 10, "guid": 2, "guidanc": 16, "gvision": 18, "h": [7, 8, 9], "h_": 10, "ha": [2, 6, 10, 16], "handl": [11, 16, 18], "handwrit": 6, "handwritten": 16, "harass": 1, "hardwar": 18, "harm": 1, "hat": 10, "have": [1, 2, 10, 12, 14, 16, 17, 18], "head": [8, 18], "healthi": 1, "hebrew": 6, "height": [7, 9], "hello": [10, 18], "help": 17, "here": [5, 9, 11, 15, 16, 18], "hf": 8, "hf_hub_download": 8, "high": 7, "higher": [3, 6, 18], "hindi": 6, "hindi_digit": 6, "hocr": 18, "hook": 18, "horizont": [7, 9, 18], "hous": 6, "how": [2, 11, 12, 14, 16], "howev": 16, "hsv": 9, "html": [1, 2, 3, 7, 18], "http": [1, 3, 6, 7, 8, 14, 18], "hub": 8, "hue": 9, "huggingfac": 8, "hw": 6, "i": [1, 2, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17], "i7": 18, "ic03": [4, 6, 16], "ic13": [4, 6, 16], "icdar": [4, 6], "icdar2019": 6, "id": 18, "ident": 1, "identifi": 4, "iiit": [4, 6], "iiit5k": [6, 16], "iiithw": [4, 6, 16], "imag": [4, 6, 7, 8, 9, 10, 14, 15, 16, 18], "imagenet": 8, "imageri": 1, "images_90k_norm": 6, "img": [6, 9, 16, 17], "img_cont": 7, "img_fold": [6, 16], "img_path": 7, "img_transform": 6, "imgur5k": [4, 6, 16], "imgur5k_annot": 6, "imlist": 6, "impact": 1, "implement": [6, 7, 8, 9, 10, 18], "import": [6, 7, 8, 9, 10, 12, 14, 15, 16, 17, 18], "improv": 8, "inappropri": 1, "incid": 1, "includ": [1, 6, 16, 17], "inclus": 1, "increas": 9, "independ": 9, "index": [2, 7], "indic": 10, "individu": 1, "infer": [4, 8, 9, 15, 18], "inform": [1, 2, 4, 6, 16], "input": [2, 7, 8, 9, 17, 18], "input_crop": 8, "input_pag": [8, 10, 18], "input_shap": 17, "input_tensor": 8, "inspir": [1, 9], "instal": [14, 15, 17], "instanc": [1, 18], "instanti": [8, 18], "instead": [6, 7, 8], "insult": 1, "int": [6, 7, 9], "int64": 10, "integ": 10, "integr": [4, 14, 16], "intel": 18, "interact": [1, 7, 10], "interfac": [14, 17], "interoper": 17, "interpol": 9, "interpret": [6, 7], "intersect": 10, "invert": 9, "investig": 1, "invis": 1, "involv": [1, 18], "io": [12, 14, 15, 17], "iou": 10, "iou_thresh": 10, "iou_threshold": 15, "irregular": [4, 8, 16], "isn": 6, "issu": [1, 2, 14], "italian": 6, "iter": [6, 9, 16, 18], "its": [7, 8, 9, 10, 16, 18], "itself": [8, 14], "j": 10, "job": 2, "join": 2, "jpeg": 9, "jpegqual": 9, "jpg": [6, 7, 14, 17], "json": [6, 16, 18], "json_output": 18, "jump": 2, "just": 1, "kei": [4, 6], "kera": [8, 17], "kernel": [4, 8, 9], "kernel_shap": 9, "keywoard": 8, "keyword": [6, 7, 8, 10], "kie": [8, 12], "kie_predictor": [8, 12], "kiepredictor": 8, "kind": 1, "know": [2, 17], "kwarg": [6, 7, 8, 10], "l": 10, "l_j": 10, "label": [6, 10, 15, 16], "label_fil": [6, 16], "label_fold": 6, "label_path": [6, 16], "labels_path": [6, 16], "ladder": 1, "lambda": 9, "lambdatransform": 9, "lang": 18, "languag": [1, 4, 6, 7, 8, 14, 18], "larg": [8, 14], "largest": 10, "last": [3, 6], "latenc": 8, "later": 2, "latest": 18, "latin": 6, "layer": 17, "layout": 18, "lead": 1, "leader": 1, "learn": [1, 4, 8, 17, 18], "least": 3, "left": [10, 18], "legacy_french": 6, "length": [6, 18], "less": [17, 18], "level": [1, 6, 10, 18], "leverag": 11, "lf": 14, "librari": [2, 3, 11, 12], "light": 4, "lightweight": 17, "like": 1, "limits_": 10, "line": [4, 8, 10, 18], "line_1_1": 18, "link": 12, "linknet": [4, 8], "linknet_resnet18": [8, 12, 17, 18], "linknet_resnet34": [8, 17, 18], "linknet_resnet50": [8, 18], "list": [6, 7, 9, 10, 14], "ll": 10, "load": [4, 6, 8, 15, 17], "load_state_dict": 12, "load_weight": 12, "loc_pr": 18, "local": [2, 4, 6, 8, 10, 16, 18], "localis": 6, "localizationconfus": 10, "locat": [2, 7, 18], "login": 8, "login_to_hub": [8, 14], "logo": [7, 15, 16], "love": 14, "lower": [9, 10, 18], "m": [2, 10, 18], "m1": 3, "macbook": 3, "machin": 17, "made": 4, "magc_resnet31": 8, "mai": [1, 2], "mail": 1, "main": 11, "maintain": 4, "mainten": 2, "make": [1, 2, 10, 12, 13, 14, 17, 18], "mani": [16, 18], "manipul": 18, "map": [6, 8], "map_loc": 12, "master": [4, 8, 18], "match": [10, 18], "mathcal": 10, "matplotlib": [7, 10], "max": [6, 9, 10], "max_angl": 9, "max_area": 9, "max_char": [6, 16], "max_delta": 9, "max_gain": 9, "max_gamma": 9, "max_qual": 9, "max_ratio": 9, "maximum": [6, 9], "maxval": [8, 9], "mbox": 10, "mean": [9, 10, 12], "meaniou": 10, "meant": [7, 17], "measur": 18, "media": 1, "median": 8, "meet": 12, "member": 1, "memori": [13, 17], "mention": 18, "merg": 6, "messag": 2, "meta": 18, "metadata": 17, "metal": 3, "method": [7, 9, 18], "metric": [10, 18], "middl": 18, "might": [17, 18], "min": 9, "min_area": 9, "min_char": [6, 16], "min_gain": 9, "min_gamma": 9, "min_qual": 9, "min_ratio": 9, "min_val": 9, "minde": [1, 3, 4, 8], "minim": [2, 4], "minimalist": [4, 8], "minimum": [3, 6, 9, 10, 18], "minval": 9, "miss": 3, "mistak": 1, "mixed_float16": 17, "mixed_precis": 17, "mjsynth": [4, 6, 16], "mnt": 6, "mobilenet": [8, 14], "mobilenet_v3_larg": 8, "mobilenet_v3_large_r": 8, "mobilenet_v3_smal": [8, 12], "mobilenet_v3_small_crop_orient": [8, 12], "mobilenet_v3_small_page_orient": [8, 12], "mobilenet_v3_small_r": 8, "mobilenetv3": 8, "modal": [4, 6], "mode": 3, "model": [6, 10, 13, 15, 16], "model_nam": [8, 14, 17], "model_path": [15, 17], "moder": 1, "modif": 2, "modifi": [8, 13, 18], "modul": [3, 7, 8, 9, 10, 18], "more": [2, 16, 18], "most": 18, "mozilla": 1, "multi": [4, 8], "multilingu": [6, 14], "multipl": [6, 7, 9, 18], "multipli": 9, "multiprocess": 13, "my": 8, "my_awesome_model": 14, "my_hook": 18, "n": [6, 10], "name": [6, 8, 17, 18], "nation": 1, "natur": [1, 4, 6], "ndarrai": [6, 7, 9, 10], "necessari": [3, 12, 13], "need": [2, 3, 6, 10, 12, 13, 14, 15, 18], "neg": 9, "nest": 18, "network": [4, 6, 8, 17], "neural": [4, 6, 8, 17], "new": [2, 10], "next": [6, 16], "nois": 9, "noisi": [4, 6], "non": [4, 6, 7, 8, 9, 10], "none": [6, 7, 8, 9, 10, 18], "normal": [8, 9], "norwegian": 6, "note": [0, 2, 6, 8, 12, 14, 15, 17], "now": 2, "np": [8, 9, 10, 18], "num_output_channel": 9, "num_sampl": [6, 16], "number": [6, 9, 10, 18], "numpi": [7, 8, 10, 18], "o": 3, "obb": 15, "obj_detect": 14, "object": [6, 7, 10, 15, 18], "objectness_scor": [7, 18], "oblig": 1, "obtain": 18, "occupi": 17, "ocr": [4, 6, 8, 10, 14], "ocr_carea": 18, "ocr_db_crnn": 10, "ocr_lin": 18, "ocr_pag": 18, "ocr_par": 18, "ocr_predictor": [8, 12, 14, 17, 18], "ocrdataset": [6, 16], "ocrmetr": 10, "ocrpredictor": [8, 12], "ocrx_word": 18, "offens": 1, "offici": [1, 8], "offlin": 1, "offset": 9, "onc": 18, "one": [2, 6, 8, 9, 12, 14, 18], "oneof": 9, "ones": [6, 10], "onli": [2, 8, 9, 10, 12, 14, 16, 17, 18], "onlin": 1, "onnx": 15, "onnxruntim": [15, 17], "onnxtr": 17, "opac": 9, "opacity_rang": 9, "open": [1, 2, 14, 17], "opinion": 1, "optic": [4, 18], "optim": [4, 18], "option": [6, 8, 12], "order": [2, 6, 7, 9], "org": [1, 6, 8, 18], "organ": 7, "orient": [1, 7, 8, 11, 15, 18], "orientationpredictor": 8, "other": [1, 2], "otherwis": [1, 7, 10], "our": [2, 8, 18], "out": [2, 8, 9, 10, 18], "outpout": 18, "output": [7, 9, 17], "output_s": [7, 9], "outsid": 13, "over": [6, 10, 18], "overal": [1, 8], "overlai": 7, "overview": 15, "overwrit": 12, "overwritten": 14, "own": 4, "p": [9, 18], "packag": [2, 4, 10, 13, 15, 16, 17], "pad": [6, 8, 9, 18], "page": [3, 6, 8, 10, 12, 18], "page1": 7, "page2": 7, "page_1": 18, "page_idx": [7, 18], "page_orientation_predictor": [8, 12], "page_param": 12, "pair": 10, "paper": 8, "par_1_1": 18, "paragraph": 18, "paragraph_break": 18, "param": [9, 18], "paramet": [4, 7, 8, 17], "pars": [4, 6], "parseq": [4, 8, 14, 17, 18], "part": [6, 9, 18], "parti": 3, "partial": 18, "particip": 1, "pass": [6, 7, 8, 12, 18], "password": 7, "patch": [8, 10], "path": [6, 7, 15, 16, 17], "path_to_checkpoint": 12, "path_to_custom_model": 17, "path_to_pt": 12, "pattern": 1, "pdf": [7, 8, 11], "pdfpage": 7, "peopl": 1, "per": [9, 18], "perform": [4, 7, 8, 9, 10, 13, 17, 18], "period": 1, "permiss": 1, "permut": [4, 8], "persian_lett": 6, "person": [1, 16], "phase": 18, "photo": 16, "physic": [1, 7], "pick": 9, "pictur": 7, "pip": [2, 3, 15, 17], "pipelin": 18, "pixel": [7, 9, 18], "pleas": 2, "plot": 10, "plt": 10, "plug": 14, "plugin": 3, "png": 7, "point": 17, "polici": 13, "polish": 6, "polit": 1, "polygon": [6, 10, 18], "pool": 8, "portugues": 6, "posit": [1, 10], "possibl": [2, 10, 14, 18], "post": [1, 18], "postprocessor": 18, "potenti": 8, "power": 4, "ppageno": 18, "pre": [2, 8, 17], "precis": [10, 18], "pred": 10, "pred_box": 10, "pred_label": 10, "predefin": 16, "predict": [7, 8, 10, 18], "predictor": [4, 7, 8, 11, 12, 14, 17], "prefer": 16, "preinstal": 3, "preprocessor": [12, 18], "prerequisit": 14, "present": 11, "preserv": [8, 9, 18], "preserve_aspect_ratio": [7, 8, 9, 12, 18], "pretrain": [4, 8, 10, 12, 17, 18], "pretrained_backbon": [8, 12], "print": 18, "prior": 6, "privaci": 1, "privat": 1, "probabl": 9, "problem": 2, "procedur": 9, "process": [2, 4, 7, 12, 18], "processor": 18, "produc": [11, 18], "product": 17, "profession": 1, "project": [2, 16], "promptli": 1, "proper": 2, "properli": 6, "provid": [1, 2, 4, 14, 15, 16, 18], "public": [1, 4], "publicli": 18, "publish": 1, "pull": 14, "punctuat": 6, "pure": 6, "purpos": 2, "push_to_hf_hub": [8, 14], "py": 14, "pypdfium2": [3, 7], "pyplot": [7, 10], "python": [2, 15], "python3": 14, "pytorch": [3, 4, 8, 9, 12, 14, 17, 18], "q": 2, "qr": [7, 15], "qr_code": 16, "qualiti": 9, "question": 1, "quickli": 4, "quicktour": 11, "r": 18, "race": 1, "ramdisk": 6, "rand": [8, 9, 10, 17, 18], "random": [8, 9, 10, 18], "randomappli": 9, "randombright": 9, "randomcontrast": 9, "randomcrop": 9, "randomgamma": 9, "randomhorizontalflip": 9, "randomhu": 9, "randomjpegqu": 9, "randomli": 9, "randomres": 9, "randomrot": 9, "randomsatur": 9, "randomshadow": 9, "rang": 9, "rassi": 14, "ratio": [8, 9, 18], "raw": [7, 10], "re": 17, "read": [4, 6, 8], "read_html": 7, "read_img_as_numpi": 7, "read_img_as_tensor": 7, "read_pdf": 7, "readi": 17, "real": [4, 8, 9], "reason": [1, 4, 6], "rebuild": 2, "rebuilt": 2, "recal": [10, 18], "receipt": [4, 6, 18], "reco_arch": [8, 12, 14, 17], "reco_b": 18, "reco_model": [12, 14, 17], "reco_param": 12, "reco_predictor": 12, "recogn": 18, "recognit": [6, 10, 11, 12], "recognition_predictor": [8, 18], "recognition_task": [6, 16], "recognitiondataset": [6, 16], "recognitionpredictor": [8, 12], "rectangular": 8, "reduc": [3, 9], "refer": [2, 3, 12, 14, 15, 16, 18], "regardless": 1, "region": 18, "regroup": 10, "regular": 16, "reject": 1, "rel": [7, 9, 10, 18], "relat": 7, "releas": [0, 3], "relev": 15, "religion": 1, "remov": 1, "render": [7, 18], "repo": 8, "repo_id": [8, 14], "report": 1, "repositori": [6, 8, 14], "repres": [1, 17, 18], "represent": [4, 8], "request": [1, 14], "requir": [3, 9, 17], "research": 4, "residu": 8, "resiz": [9, 18], "resnet": 8, "resnet18": [8, 14], "resnet31": 8, "resnet34": 8, "resnet50": [8, 14], "resolv": 7, "resolve_block": 18, "resolve_lin": 18, "resourc": 16, "respect": 1, "rest": [2, 9, 10], "restrict": 13, "result": [2, 6, 7, 11, 14, 17, 18], "return": 18, "reusabl": 18, "review": 1, "rgb": [7, 9], "rgb_mode": 7, "rgb_output": 7, "right": [1, 8, 10], "robust": [4, 6], "root": 6, "rotat": [6, 7, 8, 9, 10, 11, 12, 16, 18], "run": [2, 3, 8], "same": [2, 7, 10, 16, 17, 18], "sampl": [6, 16, 18], "sample_transform": 6, "sar": [4, 8], "sar_resnet31": [8, 18], "satur": 9, "save": [8, 16], "scale": [7, 8, 9, 10], "scale_rang": 9, "scan": [4, 6], "scene": [4, 6, 8], "score": [7, 10], "script": [2, 16], "seamless": 4, "seamlessli": [4, 18], "search": 8, "searchabl": 11, "sec": 18, "second": 18, "section": [12, 14, 15, 17, 18], "secur": [1, 13], "see": [1, 2], "seen": 18, "segment": [4, 8, 18], "self": 18, "semant": [4, 8], "send": 18, "sens": 10, "sensit": 16, "separ": 18, "sequenc": [4, 6, 7, 8, 10, 18], "sequenti": [9, 18], "seri": 1, "seriou": 1, "set": [1, 3, 6, 8, 10, 13, 15, 18], "set_global_polici": 17, "sever": [7, 9, 18], "sex": 1, "sexual": 1, "shade": 9, "shape": [4, 7, 8, 9, 10, 18], "share": [13, 16], "shift": 9, "shm": 13, "should": [2, 6, 7, 9, 10], "show": [4, 7, 8, 10, 12, 14, 15], "showcas": [2, 11], "shuffl": [6, 9], "side": 10, "signatur": 7, "signific": 16, "simpl": [4, 8, 17], "simpler": 8, "sinc": [6, 16], "singl": [1, 2, 4, 6], "single_img_doc": 17, "size": [1, 6, 7, 9, 15, 18], "skew": 18, "slack": 2, "slightli": 8, "small": [2, 8, 18], "smallest": 7, "snapshot_download": 8, "snippet": 18, "so": [2, 3, 6, 8, 14, 16], "social": 1, "socio": 1, "some": [3, 11, 14, 16], "someth": 2, "somewher": 2, "sort": 1, "sourc": [6, 7, 8, 9, 10, 14], "space": [1, 18], "span": 18, "spanish": 6, "spatial": [4, 6, 7], "specif": [2, 3, 10, 12, 16, 18], "specifi": [1, 6, 7], "speed": [4, 8, 18], "sphinx": 2, "sroie": [4, 6, 16], "stabl": 3, "stackoverflow": 2, "stage": 4, "standalon": 11, "standard": 9, "start": 6, "state": [4, 10, 15], "static": 10, "statu": 1, "std": [9, 12], "step": 13, "still": 18, "str": [6, 7, 8, 9, 10], "straight": [6, 8, 16, 18], "straighten": 18, "straighten_pag": [8, 12, 18], "straigten_pag": 12, "stream": 7, "street": [4, 6], "strict": 3, "strictli": 10, "string": [6, 7, 10, 18], "strive": 3, "strong": [4, 8], "structur": [17, 18], "subset": [6, 18], "suggest": [2, 14], "sum": 10, "summari": 10, "support": [3, 12, 15, 17, 18], "sustain": 1, "svhn": [4, 6, 16], "svt": [6, 16], "swedish": 6, "symmetr": [8, 9, 18], "symmetric_pad": [8, 9, 18], "synthet": 4, "synthtext": [4, 6, 16], "system": 18, "t": [2, 6, 12, 17, 18], "tabl": [14, 15, 16], "take": [1, 6, 18], "target": [6, 7, 9, 10, 16], "target_s": 6, "task": [4, 6, 8, 14, 16, 18], "task2": 6, "team": 3, "techminde": 3, "templat": [2, 4], "tensor": [6, 7, 9, 18], "tensorflow": [3, 4, 7, 8, 9, 12, 14, 17, 18], "tensorspec": 17, "term": 1, "test": [6, 16], "test_set": 6, "text": [6, 7, 8, 10, 16], "text_output": 18, "textmatch": 10, "textnet": 8, "textnet_bas": 8, "textnet_smal": 8, "textnet_tini": 8, "textract": [4, 18], "textstylebrush": [4, 6], "textual": [4, 6, 7, 8, 18], "tf": [3, 7, 8, 9, 14, 17], "than": [2, 10, 14], "thank": 2, "thei": [1, 10], "them": [6, 18], "thi": [1, 2, 3, 5, 6, 9, 10, 12, 13, 14, 16, 17, 18], "thing": [17, 18], "third": 3, "those": [1, 7, 18], "threaten": 1, "threshold": 18, "through": [1, 9, 15, 16], "tilman": 14, "time": [1, 4, 8, 10, 16], "tini": 8, "titl": [7, 18], "tm": 18, "tmp": 13, "togeth": [2, 7], "tograi": 9, "tool": 16, "top": [10, 17, 18], "topic": 2, "torch": [3, 9, 12, 14, 17], "torchvis": 9, "total": 12, "toward": [1, 3], "train": [2, 6, 8, 9, 14, 15, 16, 17, 18], "train_it": [6, 16], "train_load": [6, 16], "train_pytorch": 14, "train_set": [6, 16], "train_tensorflow": 14, "trainabl": [4, 8], "tranform": 9, "transcrib": 18, "transfer": [4, 6], "transfo": 9, "transform": [4, 6, 8], "translat": 1, "troll": 1, "true": [6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18], "truth": 10, "tune": 17, "tupl": [6, 7, 9, 10], "two": [7, 13], "txt": 6, "type": [7, 10, 14, 17, 18], "typic": 18, "u": [1, 2], "ucsd": 6, "udac": 2, "uint8": [7, 8, 10, 18], "ukrainian": 6, "unaccept": 1, "underli": [16, 18], "underneath": 7, "understand": [4, 6, 18], "uniform": [8, 9], "uniformli": 9, "uninterrupt": [7, 18], "union": 10, "unittest": 2, "unlock": 7, "unoffici": 8, "unprofession": 1, "unsolicit": 1, "unsupervis": 4, "unwelcom": 1, "up": [8, 18], "updat": 10, "upgrad": 2, "upper": [6, 9], "uppercas": 16, "url": 7, "us": [1, 2, 3, 6, 8, 10, 11, 12, 13, 14, 15, 18], "usabl": 18, "usag": [13, 17], "use_polygon": [6, 10, 16], "useabl": 18, "user": [4, 7, 11], "utf": 18, "util": 17, "v1": 14, "v3": [8, 14, 18], "valid": 16, "valu": [2, 7, 9, 18], "valuabl": 4, "variabl": 13, "varieti": 6, "veri": 8, "version": [1, 2, 3, 17, 18], "vgg": 8, "vgg16": 14, "vgg16_bn_r": 8, "via": 1, "vietnames": 6, "view": [4, 6], "viewpoint": 1, "violat": 1, "visibl": 1, "vision": [4, 6, 8], "visiondataset": 6, "visiontransform": 8, "visual": [3, 4, 15], "visualize_pag": 10, "vit_": 8, "vit_b": 8, "vitstr": [4, 8, 17], "vitstr_bas": [8, 18], "vitstr_smal": [8, 12, 17, 18], "viz": 3, "vocab": [12, 14, 16, 17, 18], "vocabulari": [6, 12, 14], "w": [7, 8, 9, 10], "w3": 18, "wa": 1, "wai": [1, 4, 16], "want": [2, 17, 18], "warmup": 18, "wasn": 2, "we": [1, 2, 3, 4, 7, 9, 12, 14, 16, 17, 18], "weasyprint": 7, "web": [2, 7], "websit": 6, "welcom": 1, "well": [1, 17], "were": [1, 7, 18], "what": 1, "when": [1, 2, 8], "whenev": 2, "where": [2, 7, 9, 10], "whether": [2, 6, 7, 9, 10, 16, 18], "which": [1, 8, 13, 15, 16, 18], "whichev": 3, "while": [9, 18], "why": 1, "width": [7, 9], "wiki": 1, "wildreceipt": [4, 6, 16], "window": [8, 10], "wish": 2, "within": 1, "without": [1, 6, 8], "wonder": 2, "word": [4, 6, 8, 10, 18], "word_1_1": 18, "word_1_2": 18, "word_1_3": 18, "wordgener": [6, 16], "words_onli": 10, "work": [12, 13, 18], "workflow": 2, "worklow": 2, "world": [10, 18], "worth": 8, "wrap": 18, "wrapper": [6, 9], "write": 13, "written": [1, 7], "www": [1, 7, 18], "x": [7, 9, 10], "x_ascend": 18, "x_descend": 18, "x_i": 10, "x_size": 18, "x_wconf": 18, "xhtml": 18, "xmax": 7, "xmin": 7, "xml": 18, "xml_bytes_str": 18, "xml_element": 18, "xml_output": 18, "xmln": 18, "y": 10, "y_i": 10, "y_j": 10, "yet": 15, "ymax": 7, "ymin": 7, "yolov8": 15, "you": [2, 3, 6, 7, 8, 12, 13, 14, 15, 16, 17, 18], "your": [2, 4, 7, 10, 18], "yoursit": 7, "zero": [9, 10], "zoo": 12, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 6, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 6, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 6, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 6, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 6, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 6, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 6, "\u00e4\u00f6\u00e4\u00f6": 6, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 6, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 6, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 6, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 6, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 6, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 6, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0905": 6, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 6, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 6, "\u0950": 6, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 6, "\u09bd": 6, "\u09ce": 6, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 6}, "titles": ["Changelog", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 2, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 1], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 1], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 1], "31": 0, "4": [0, 1], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 18, "approach": 18, "architectur": 18, "arg": [6, 7, 8, 9, 10], "artefact": 7, "artefactdetect": 15, "attribut": 1, "avail": [15, 16, 18], "aw": 13, "ban": 1, "block": 7, "bug": 2, "changelog": 0, "choos": [16, 18], "classif": [8, 12, 14], "code": [1, 2], "codebas": 2, "commit": 2, "commun": 14, "compos": 9, "conda": 3, "conduct": 1, "connect": 2, "continu": 2, "contrib": 5, "contribut": [2, 5, 15], "contributor": 1, "convent": 14, "correct": 1, "coven": 1, "custom": [6, 12], "data": 16, "dataload": 6, "dataset": [4, 6, 16], "detect": [4, 8, 14, 16, 18], "develop": 2, "do": 18, "doctr": [2, 4, 5, 6, 7, 8, 9, 10, 11], "document": [2, 4, 7], "end": 18, "enforc": 1, "evalu": 10, "export": 17, "factori": 8, "featur": [2, 4], "feedback": 2, "file": 7, "from": 14, "gener": [6, 16], "git": 3, "guidelin": 1, "half": 17, "hub": 14, "huggingfac": 14, "i": 18, "infer": 17, "instal": [2, 3], "integr": [2, 15], "io": 7, "lambda": 13, "let": 2, "line": 7, "linux": 3, "load": [12, 14, 16], "loader": 6, "main": 4, "mode": 2, "model": [4, 8, 12, 14, 17, 18], "modifi": 2, "modul": [5, 15], "name": 14, "notebook": 11, "object": 16, "ocr": [16, 18], "onli": 3, "onnx": 17, "optim": 17, "option": 18, "orient": 12, "our": 1, "output": 18, "own": [12, 16], "packag": 3, "page": 7, "perman": 1, "pipelin": 15, "pledg": 1, "precis": 17, "predictor": 18, "prepar": 17, "prerequisit": 3, "pretrain": 14, "push": 14, "python": 3, "qualiti": 2, "question": 2, "read": 7, "readi": 16, "recognit": [4, 8, 14, 16, 18], "report": 2, "request": 2, "respons": 1, "return": [6, 7, 8, 10], "right": 18, "scope": 1, "share": 14, "should": 18, "stage": 18, "standard": 1, "structur": [2, 7], "style": 2, "support": [4, 5, 6, 9], "synthet": [6, 16], "task": 10, "temporari": 1, "test": 2, "text": [4, 18], "train": 12, "transform": 9, "two": 18, "unit": 2, "us": [16, 17], "util": 10, "v0": 0, "verif": 2, "via": 3, "visual": 10, "vocab": 6, "warn": 1, "what": 18, "word": 7, "your": [12, 14, 15, 16, 17], "zoo": [4, 8]}})
\ No newline at end of file
diff --git a/using_doctr/custom_models_training.html b/using_doctr/custom_models_training.html
index 580b4368b7..e664c6a950 100644
--- a/using_doctr/custom_models_training.html
+++ b/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -615,7 +615,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/using_doctr/running_on_aws.html b/using_doctr/running_on_aws.html
index ddb0c3c80f..81c38b49f5 100644
--- a/using_doctr/running_on_aws.html
+++ b/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -358,7 +358,7 @@ AWS Lambda
-
+
diff --git a/using_doctr/sharing_models.html b/using_doctr/sharing_models.html
index 07a3b2f2a3..4f5d1d68a5 100644
--- a/using_doctr/sharing_models.html
+++ b/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -540,7 +540,7 @@ Recognition
-
+
diff --git a/using_doctr/using_contrib_modules.html b/using_doctr/using_contrib_modules.html
index b4a10925e6..cf282ff3a4 100644
--- a/using_doctr/using_contrib_modules.html
+++ b/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -411,7 +411,7 @@ ArtefactDetection
-
+
diff --git a/using_doctr/using_datasets.html b/using_doctr/using_datasets.html
index 4a52df36ba..e30b6d6459 100644
--- a/using_doctr/using_datasets.html
+++ b/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -638,7 +638,7 @@ Data Loading
-
+
diff --git a/using_doctr/using_model_export.html b/using_doctr/using_model_export.html
index 2b30ee63a1..ad9d09ed4c 100644
--- a/using_doctr/using_model_export.html
+++ b/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -463,7 +463,7 @@ Using your ONNX exported model
-
+
diff --git a/using_doctr/using_models.html b/using_doctr/using_models.html
index 13cb06116b..5c80dbf62d 100644
--- a/using_doctr/using_models.html
+++ b/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1249,7 +1249,7 @@ Advanced options
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/cord.html b/v0.1.0/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.0/_modules/doctr/datasets/cord.html
+++ b/v0.1.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/detection.html b/v0.1.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.0/_modules/doctr/datasets/detection.html
+++ b/v0.1.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/doc_artefacts.html b/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/funsd.html b/v0.1.0/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.0/_modules/doctr/datasets/funsd.html
+++ b/v0.1.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ic03.html b/v0.1.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.0/_modules/doctr/datasets/ic03.html
+++ b/v0.1.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ic13.html b/v0.1.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.0/_modules/doctr/datasets/ic13.html
+++ b/v0.1.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/iiit5k.html b/v0.1.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/iiithws.html b/v0.1.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/imgur5k.html b/v0.1.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/loader.html b/v0.1.0/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.0/_modules/doctr/datasets/loader.html
+++ b/v0.1.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/mjsynth.html b/v0.1.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ocr.html b/v0.1.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.0/_modules/doctr/datasets/ocr.html
+++ b/v0.1.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/recognition.html b/v0.1.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.0/_modules/doctr/datasets/recognition.html
+++ b/v0.1.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/sroie.html b/v0.1.0/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.0/_modules/doctr/datasets/sroie.html
+++ b/v0.1.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/svhn.html b/v0.1.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.0/_modules/doctr/datasets/svhn.html
+++ b/v0.1.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/svt.html b/v0.1.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.0/_modules/doctr/datasets/svt.html
+++ b/v0.1.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/synthtext.html b/v0.1.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/utils.html b/v0.1.0/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.0/_modules/doctr/datasets/utils.html
+++ b/v0.1.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/wildreceipt.html b/v0.1.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.0/_modules/doctr/io/elements.html b/v0.1.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.0/_modules/doctr/io/elements.html
+++ b/v0.1.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.0/_modules/doctr/io/html.html b/v0.1.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.0/_modules/doctr/io/html.html
+++ b/v0.1.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.0/_modules/doctr/io/image/base.html b/v0.1.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.0/_modules/doctr/io/image/base.html
+++ b/v0.1.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.0/_modules/doctr/io/image/tensorflow.html b/v0.1.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/io/pdf.html b/v0.1.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.0/_modules/doctr/io/pdf.html
+++ b/v0.1.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.0/_modules/doctr/io/reader.html b/v0.1.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.0/_modules/doctr/io/reader.html
+++ b/v0.1.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/zoo.html b/v0.1.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/zoo.html b/v0.1.0/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/models/factory/hub.html b/v0.1.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.0/_modules/doctr/models/factory/hub.html
+++ b/v0.1.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/zoo.html b/v0.1.0/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/models/zoo.html b/v0.1.0/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.0/_modules/doctr/models/zoo.html
+++ b/v0.1.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/transforms/modules/base.html b/v0.1.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/utils/metrics.html b/v0.1.0/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.0/_modules/doctr/utils/metrics.html
+++ b/v0.1.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.0/_modules/doctr/utils/visualization.html b/v0.1.0/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.0/_modules/doctr/utils/visualization.html
+++ b/v0.1.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.0/_modules/index.html b/v0.1.0/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.0/_modules/index.html
+++ b/v0.1.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.0/_sources/getting_started/installing.rst.txt b/v0.1.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.0/_sources/getting_started/installing.rst.txt
+++ b/v0.1.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.0/_static/basic.css b/v0.1.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.0/_static/basic.css
+++ b/v0.1.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.0/_static/doctools.js b/v0.1.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.0/_static/doctools.js
+++ b/v0.1.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.0/_static/language_data.js b/v0.1.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.0/_static/language_data.js
+++ b/v0.1.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.0/_static/searchtools.js b/v0.1.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.0/_static/searchtools.js
+++ b/v0.1.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.0/changelog.html b/v0.1.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.0/changelog.html
+++ b/v0.1.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.0/community/resources.html b/v0.1.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.0/community/resources.html
+++ b/v0.1.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.0/contributing/code_of_conduct.html b/v0.1.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.0/contributing/code_of_conduct.html
+++ b/v0.1.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.0/contributing/contributing.html b/v0.1.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.0/contributing/contributing.html
+++ b/v0.1.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.0/genindex.html b/v0.1.0/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.0/genindex.html
+++ b/v0.1.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.0/getting_started/installing.html b/v0.1.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.0/getting_started/installing.html
+++ b/v0.1.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.0/index.html b/v0.1.0/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.0/index.html
+++ b/v0.1.0/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.0/modules/contrib.html b/v0.1.0/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.0/modules/contrib.html
+++ b/v0.1.0/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.0/modules/datasets.html b/v0.1.0/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.0/modules/datasets.html
+++ b/v0.1.0/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.0/modules/io.html b/v0.1.0/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.0/modules/io.html
+++ b/v0.1.0/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.0/modules/models.html b/v0.1.0/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.0/modules/models.html
+++ b/v0.1.0/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.0/modules/transforms.html b/v0.1.0/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.0/modules/transforms.html
+++ b/v0.1.0/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.0/modules/utils.html b/v0.1.0/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.0/modules/utils.html
+++ b/v0.1.0/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.0/notebooks.html b/v0.1.0/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.0/notebooks.html
+++ b/v0.1.0/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.0/search.html b/v0.1.0/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.0/search.html
+++ b/v0.1.0/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.0/searchindex.js b/v0.1.0/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.0/searchindex.js
+++ b/v0.1.0/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.0/using_doctr/custom_models_training.html b/v0.1.0/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.0/using_doctr/custom_models_training.html
+++ b/v0.1.0/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.0/using_doctr/running_on_aws.html b/v0.1.0/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.0/using_doctr/running_on_aws.html
+++ b/v0.1.0/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.0/using_doctr/sharing_models.html b/v0.1.0/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.0/using_doctr/sharing_models.html
+++ b/v0.1.0/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.0/using_doctr/using_contrib_modules.html b/v0.1.0/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.0/using_doctr/using_contrib_modules.html
+++ b/v0.1.0/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.0/using_doctr/using_datasets.html b/v0.1.0/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.0/using_doctr/using_datasets.html
+++ b/v0.1.0/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.0/using_doctr/using_model_export.html b/v0.1.0/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.0/using_doctr/using_model_export.html
+++ b/v0.1.0/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.0/using_doctr/using_models.html b/v0.1.0/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.0/using_doctr/using_models.html
+++ b/v0.1.0/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/cord.html b/v0.1.1/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.1/_modules/doctr/datasets/cord.html
+++ b/v0.1.1/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/detection.html b/v0.1.1/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.1/_modules/doctr/datasets/detection.html
+++ b/v0.1.1/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/funsd.html b/v0.1.1/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.1/_modules/doctr/datasets/funsd.html
+++ b/v0.1.1/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic03.html b/v0.1.1/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.1/_modules/doctr/datasets/ic03.html
+++ b/v0.1.1/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic13.html b/v0.1.1/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.1/_modules/doctr/datasets/ic13.html
+++ b/v0.1.1/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiit5k.html b/v0.1.1/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.1/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.1/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiithws.html b/v0.1.1/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.1/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.1/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/imgur5k.html b/v0.1.1/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.1/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.1/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/loader.html b/v0.1.1/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.1/_modules/doctr/datasets/loader.html
+++ b/v0.1.1/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/mjsynth.html b/v0.1.1/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.1/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.1/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ocr.html b/v0.1.1/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.1/_modules/doctr/datasets/ocr.html
+++ b/v0.1.1/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/recognition.html b/v0.1.1/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.1/_modules/doctr/datasets/recognition.html
+++ b/v0.1.1/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/sroie.html b/v0.1.1/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.1/_modules/doctr/datasets/sroie.html
+++ b/v0.1.1/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svhn.html b/v0.1.1/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.1/_modules/doctr/datasets/svhn.html
+++ b/v0.1.1/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svt.html b/v0.1.1/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.1/_modules/doctr/datasets/svt.html
+++ b/v0.1.1/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/synthtext.html b/v0.1.1/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.1/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.1/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/utils.html b/v0.1.1/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.1/_modules/doctr/datasets/utils.html
+++ b/v0.1.1/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/wildreceipt.html b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.1/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.1/_modules/doctr/io/elements.html b/v0.1.1/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.1/_modules/doctr/io/elements.html
+++ b/v0.1.1/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.1/_modules/doctr/io/html.html b/v0.1.1/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.1/_modules/doctr/io/html.html
+++ b/v0.1.1/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/base.html b/v0.1.1/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.1/_modules/doctr/io/image/base.html
+++ b/v0.1.1/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/tensorflow.html b/v0.1.1/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.1/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.1/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/io/pdf.html b/v0.1.1/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.1/_modules/doctr/io/pdf.html
+++ b/v0.1.1/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.1/_modules/doctr/io/reader.html b/v0.1.1/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.1/_modules/doctr/io/reader.html
+++ b/v0.1.1/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/zoo.html b/v0.1.1/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.1/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.1/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/zoo.html b/v0.1.1/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.1/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.1/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/factory/hub.html b/v0.1.1/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.1/_modules/doctr/models/factory/hub.html
+++ b/v0.1.1/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/zoo.html b/v0.1.1/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.1/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.1/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/zoo.html b/v0.1.1/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.1/_modules/doctr/models/zoo.html
+++ b/v0.1.1/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/base.html b/v0.1.1/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/utils/metrics.html b/v0.1.1/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.1/_modules/doctr/utils/metrics.html
+++ b/v0.1.1/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.1/_modules/doctr/utils/visualization.html b/v0.1.1/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.1/_modules/doctr/utils/visualization.html
+++ b/v0.1.1/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.1/_modules/index.html b/v0.1.1/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.1/_modules/index.html
+++ b/v0.1.1/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.1/_sources/getting_started/installing.rst.txt b/v0.1.1/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.1/_sources/getting_started/installing.rst.txt
+++ b/v0.1.1/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.1/_static/basic.css b/v0.1.1/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.1/_static/basic.css
+++ b/v0.1.1/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.1/_static/doctools.js b/v0.1.1/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.1/_static/doctools.js
+++ b/v0.1.1/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.1/_static/language_data.js b/v0.1.1/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.1/_static/language_data.js
+++ b/v0.1.1/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.1/_static/searchtools.js b/v0.1.1/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.1/_static/searchtools.js
+++ b/v0.1.1/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.1/changelog.html b/v0.1.1/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.1/changelog.html
+++ b/v0.1.1/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.1/community/resources.html b/v0.1.1/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.1/community/resources.html
+++ b/v0.1.1/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.1/contributing/code_of_conduct.html b/v0.1.1/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.1/contributing/code_of_conduct.html
+++ b/v0.1.1/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.1/contributing/contributing.html b/v0.1.1/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.1/contributing/contributing.html
+++ b/v0.1.1/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.1/genindex.html b/v0.1.1/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.1/genindex.html
+++ b/v0.1.1/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.1/getting_started/installing.html b/v0.1.1/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.1/getting_started/installing.html
+++ b/v0.1.1/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.1/index.html b/v0.1.1/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.1/index.html
+++ b/v0.1.1/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.1/modules/contrib.html b/v0.1.1/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.1/modules/contrib.html
+++ b/v0.1.1/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.1/modules/datasets.html b/v0.1.1/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.1/modules/datasets.html
+++ b/v0.1.1/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.1/modules/io.html b/v0.1.1/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.1/modules/io.html
+++ b/v0.1.1/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.1/modules/models.html b/v0.1.1/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.1/modules/models.html
+++ b/v0.1.1/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.1/modules/transforms.html b/v0.1.1/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.1/modules/transforms.html
+++ b/v0.1.1/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
docTR Notebooks
-
+
diff --git a/latest/search.html b/latest/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/latest/search.html
+++ b/latest/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/latest/searchindex.js b/latest/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/latest/searchindex.js
+++ b/latest/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/latest/using_doctr/custom_models_training.html b/latest/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/latest/using_doctr/custom_models_training.html
+++ b/latest/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/latest/using_doctr/running_on_aws.html b/latest/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/latest/using_doctr/running_on_aws.html
+++ b/latest/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/latest/using_doctr/sharing_models.html b/latest/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/latest/using_doctr/sharing_models.html
+++ b/latest/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/latest/using_doctr/using_contrib_modules.html b/latest/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/latest/using_doctr/using_contrib_modules.html
+++ b/latest/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/latest/using_doctr/using_datasets.html b/latest/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/latest/using_doctr/using_datasets.html
+++ b/latest/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/latest/using_doctr/using_model_export.html b/latest/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/latest/using_doctr/using_model_export.html
+++ b/latest/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/latest/using_doctr/using_models.html b/latest/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/latest/using_doctr/using_models.html
+++ b/latest/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/modules/contrib.html b/modules/contrib.html
index 22b0c508a6..b8878635b6 100644
--- a/modules/contrib.html
+++ b/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -376,7 +376,7 @@ Supported contribution modules
-
+
diff --git a/modules/datasets.html b/modules/datasets.html
index 0fe4b78d48..dfcacbc96e 100644
--- a/modules/datasets.html
+++ b/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1077,7 +1077,7 @@ Returns:
-
+
diff --git a/modules/io.html b/modules/io.html
index 924d292c59..77e9e017bf 100644
--- a/modules/io.html
+++ b/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -756,7 +756,7 @@ Returns:¶
-
+
diff --git a/modules/models.html b/modules/models.html
index bf45d11a71..f4a9833365 100644
--- a/modules/models.html
+++ b/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1598,7 +1598,7 @@ Args:¶
-
+
diff --git a/modules/transforms.html b/modules/transforms.html
index 6d77d16e7b..bc254c867b 100644
--- a/modules/transforms.html
+++ b/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -831,7 +831,7 @@ Args:¶<
-
+
diff --git a/modules/utils.html b/modules/utils.html
index 3dd3ecbd96..6784d81f6f 100644
--- a/modules/utils.html
+++ b/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -711,7 +711,7 @@ Args:¶
-
+
diff --git a/notebooks.html b/notebooks.html
index f3ea994e49..647f73d4eb 100644
--- a/notebooks.html
+++ b/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -387,7 +387,7 @@ docTR Notebooks
-
+
diff --git a/search.html b/search.html
index f0693e2c97..0e0da5efb3 100644
--- a/search.html
+++ b/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -336,7 +336,7 @@
-
+
diff --git a/searchindex.js b/searchindex.js
index 8598997441..df18967072 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[1, "correction"]], "2. Warning": [[1, "warning"]], "3. Temporary Ban": [[1, "temporary-ban"]], "4. Permanent Ban": [[1, "permanent-ban"]], "AWS Lambda": [[13, null]], "Advanced options": [[18, "advanced-options"]], "Args:": [[6, "args"], [6, "id4"], [6, "id7"], [6, "id10"], [6, "id13"], [6, "id16"], [6, "id19"], [6, "id22"], [6, "id25"], [6, "id29"], [6, "id32"], [6, "id37"], [6, "id40"], [6, "id46"], [6, "id49"], [6, "id50"], [6, "id51"], [6, "id54"], [6, "id57"], [6, "id60"], [6, "id61"], [7, "args"], [7, "id2"], [7, "id3"], [7, "id4"], [7, "id5"], [7, "id6"], [7, "id7"], [7, "id10"], [7, "id12"], [7, "id14"], [7, "id16"], [7, "id20"], [7, "id24"], [7, "id28"], [8, "args"], [8, "id3"], [8, "id8"], [8, "id13"], [8, "id17"], [8, "id21"], [8, "id26"], [8, "id31"], [8, "id36"], [8, "id41"], [8, "id46"], [8, "id50"], [8, "id54"], [8, "id59"], [8, "id63"], [8, "id68"], [8, "id73"], [8, "id77"], [8, "id81"], [8, "id85"], [8, "id90"], [8, "id95"], [8, "id99"], [8, "id104"], [8, "id109"], [8, "id114"], [8, "id119"], [8, "id123"], [8, "id127"], [8, "id132"], [8, "id137"], [8, "id142"], [8, "id146"], [8, "id150"], [8, "id155"], [8, "id159"], [8, "id163"], [8, "id167"], [8, "id169"], [8, "id171"], [8, "id173"], [9, "args"], [9, "id1"], [9, "id2"], [9, "id3"], [9, "id4"], [9, "id5"], [9, "id6"], [9, "id7"], [9, "id8"], [9, "id9"], [9, "id10"], [9, "id11"], [9, "id12"], [9, "id13"], [9, "id14"], [9, "id15"], [9, "id16"], [9, "id17"], [9, "id18"], [9, "id19"], [10, "args"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"]], "Artefact": [[7, "artefact"]], "ArtefactDetection": [[15, "artefactdetection"]], "Attribution": [[1, "attribution"]], "Available Datasets": [[16, "available-datasets"]], "Available architectures": [[18, "available-architectures"], [18, "id1"], [18, "id2"]], "Available contribution modules": [[15, "available-contribution-modules"]], "Block": [[7, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[16, null]], "Choosing the right model": [[18, null]], "Classification": [[14, "classification"]], "Code quality": [[2, "code-quality"]], "Code style verification": [[2, "code-style-verification"]], "Codebase structure": [[2, "codebase-structure"]], "Commits": [[2, "commits"]], "Composing transformations": [[9, "composing-transformations"]], "Continuous Integration": [[2, "continuous-integration"]], "Contributing to docTR": [[2, null]], "Contributor Covenant Code of Conduct": [[1, null]], "Custom dataset loader": [[6, "custom-dataset-loader"]], "Custom orientation classification models": [[12, "custom-orientation-classification-models"]], "Data Loading": [[16, "data-loading"]], "Dataloader": [[6, "dataloader"]], "Detection": [[14, "detection"], [16, "detection"]], "Detection predictors": [[18, "detection-predictors"]], "Developer mode installation": [[2, "developer-mode-installation"]], "Developing docTR": [[2, "developing-doctr"]], "Document": [[7, "document"]], "Document structure": [[7, "document-structure"]], "End-to-End OCR": [[18, "end-to-end-ocr"]], "Enforcement": [[1, "enforcement"]], "Enforcement Guidelines": [[1, "enforcement-guidelines"]], "Enforcement Responsibilities": [[1, "enforcement-responsibilities"]], "Export to ONNX": [[17, "export-to-onnx"]], "Feature requests & bug report": [[2, "feature-requests-bug-report"]], "Feedback": [[2, "feedback"]], "File reading": [[7, "file-reading"]], "Half-precision": [[17, "half-precision"]], "Installation": [[3, null]], "Integrate contributions into your pipeline": [[15, null]], "Let\u2019s connect": [[2, "let-s-connect"]], "Line": [[7, "line"]], "Loading from Huggingface Hub": [[14, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[12, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[12, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[4, "main-features"]], "Model optimization": [[17, "model-optimization"]], "Model zoo": [[4, "model-zoo"]], "Modifying the documentation": [[2, "modifying-the-documentation"]], "Naming conventions": [[14, "naming-conventions"]], "OCR": [[16, "ocr"]], "Object Detection": [[16, "object-detection"]], "Our Pledge": [[1, "our-pledge"]], "Our Standards": [[1, "our-standards"]], "Page": [[7, "page"]], "Preparing your model for inference": [[17, null]], "Prerequisites": [[3, "prerequisites"]], "Pretrained community models": [[14, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[14, "pushing-to-the-huggingface-hub"]], "Questions": [[2, "questions"]], "Recognition": [[14, "recognition"], [16, "recognition"]], "Recognition predictors": [[18, "recognition-predictors"]], "Returns:": [[6, "returns"], [7, "returns"], [7, "id11"], [7, "id13"], [7, "id15"], [7, "id19"], [7, "id23"], [7, "id27"], [7, "id31"], [8, "returns"], [8, "id6"], [8, "id11"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id29"], [8, "id34"], [8, "id39"], [8, "id44"], [8, "id49"], [8, "id53"], [8, "id57"], [8, "id62"], [8, "id66"], [8, "id71"], [8, "id76"], [8, "id80"], [8, "id84"], [8, "id88"], [8, "id93"], [8, "id98"], [8, "id102"], [8, "id107"], [8, "id112"], [8, "id117"], [8, "id122"], [8, "id126"], [8, "id130"], [8, "id135"], [8, "id140"], [8, "id145"], [8, "id149"], [8, "id153"], [8, "id158"], [8, "id162"], [8, "id166"], [8, "id168"], [8, "id170"], [8, "id172"], [10, "returns"]], "Scope": [[1, "scope"]], "Share your model with the community": [[14, null]], "Supported Vocabs": [[6, "supported-vocabs"]], "Supported contribution modules": [[5, "supported-contribution-modules"]], "Supported datasets": [[4, "supported-datasets"]], "Supported transformations": [[9, "supported-transformations"]], "Synthetic dataset generator": [[6, "synthetic-dataset-generator"], [16, "synthetic-dataset-generator"]], "Task evaluation": [[10, "task-evaluation"]], "Text Detection": [[18, "text-detection"]], "Text Recognition": [[18, "text-recognition"]], "Text detection models": [[4, "text-detection-models"]], "Text recognition models": [[4, "text-recognition-models"]], "Train your own model": [[12, null]], "Two-stage approaches": [[18, "two-stage-approaches"]], "Unit tests": [[2, "unit-tests"]], "Use your own datasets": [[16, "use-your-own-datasets"]], "Using your ONNX exported model": [[17, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[3, "via-conda-only-for-linux"]], "Via Git": [[3, "via-git"]], "Via Python Package": [[3, "via-python-package"]], "Visualization": [[10, "visualization"]], "What should I do with the output?": [[18, "what-should-i-do-with-the-output"]], "Word": [[7, "word"]], "docTR Notebooks": [[11, null]], "docTR Vocabs": [[6, "id62"]], "docTR: Document Text Recognition": [[4, null]], "doctr.contrib": [[5, null]], "doctr.datasets": [[6, null], [6, "datasets"]], "doctr.io": [[7, null]], "doctr.models": [[8, null]], "doctr.models.classification": [[8, "doctr-models-classification"]], "doctr.models.detection": [[8, "doctr-models-detection"]], "doctr.models.factory": [[8, "doctr-models-factory"]], "doctr.models.recognition": [[8, "doctr-models-recognition"]], "doctr.models.zoo": [[8, "doctr-models-zoo"]], "doctr.transforms": [[9, null]], "doctr.utils": [[10, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[7, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[7, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[9, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[6, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[9, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[9, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[6, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[6, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[8, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[6, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[6, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[7, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[7, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[6, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[6, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[9, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[9, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[6, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[6, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[6, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[6, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[6, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[8, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[9, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[7, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[6, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[9, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[8, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[6, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[9, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[7, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[9, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[9, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[9, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[9, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[9, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[9, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[9, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[9, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[9, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[9, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[9, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[9, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[7, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[7, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[7, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[6, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[9, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[7, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[7, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[6, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[6, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[6, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[6, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[9, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[10, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[6, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[7, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[6, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[6, 0, 1, "", "CORD"], [6, 0, 1, "", "CharacterGenerator"], [6, 0, 1, "", "DetectionDataset"], [6, 0, 1, "", "DocArtefacts"], [6, 0, 1, "", "FUNSD"], [6, 0, 1, "", "IC03"], [6, 0, 1, "", "IC13"], [6, 0, 1, "", "IIIT5K"], [6, 0, 1, "", "IIITHWS"], [6, 0, 1, "", "IMGUR5K"], [6, 0, 1, "", "MJSynth"], [6, 0, 1, "", "OCRDataset"], [6, 0, 1, "", "RecognitionDataset"], [6, 0, 1, "", "SROIE"], [6, 0, 1, "", "SVHN"], [6, 0, 1, "", "SVT"], [6, 0, 1, "", "SynthText"], [6, 0, 1, "", "WILDRECEIPT"], [6, 0, 1, "", "WordGenerator"], [6, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[6, 0, 1, "", "DataLoader"]], "doctr.io": [[7, 0, 1, "", "Artefact"], [7, 0, 1, "", "Block"], [7, 0, 1, "", "Document"], [7, 0, 1, "", "DocumentFile"], [7, 0, 1, "", "Line"], [7, 0, 1, "", "Page"], [7, 0, 1, "", "Word"], [7, 1, 1, "", "decode_img_as_tensor"], [7, 1, 1, "", "read_html"], [7, 1, 1, "", "read_img_as_numpy"], [7, 1, 1, "", "read_img_as_tensor"], [7, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[7, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[7, 2, 1, "", "from_images"], [7, 2, 1, "", "from_pdf"], [7, 2, 1, "", "from_url"]], "doctr.io.Page": [[7, 2, 1, "", "show"]], "doctr.models": [[8, 1, 1, "", "kie_predictor"], [8, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[8, 1, 1, "", "crop_orientation_predictor"], [8, 1, 1, "", "magc_resnet31"], [8, 1, 1, "", "mobilenet_v3_large"], [8, 1, 1, "", "mobilenet_v3_large_r"], [8, 1, 1, "", "mobilenet_v3_small"], [8, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [8, 1, 1, "", "mobilenet_v3_small_page_orientation"], [8, 1, 1, "", "mobilenet_v3_small_r"], [8, 1, 1, "", "page_orientation_predictor"], [8, 1, 1, "", "resnet18"], [8, 1, 1, "", "resnet31"], [8, 1, 1, "", "resnet34"], [8, 1, 1, "", "resnet50"], [8, 1, 1, "", "textnet_base"], [8, 1, 1, "", "textnet_small"], [8, 1, 1, "", "textnet_tiny"], [8, 1, 1, "", "vgg16_bn_r"], [8, 1, 1, "", "vit_b"], [8, 1, 1, "", "vit_s"]], "doctr.models.detection": [[8, 1, 1, "", "db_mobilenet_v3_large"], [8, 1, 1, "", "db_resnet50"], [8, 1, 1, "", "detection_predictor"], [8, 1, 1, "", "fast_base"], [8, 1, 1, "", "fast_small"], [8, 1, 1, "", "fast_tiny"], [8, 1, 1, "", "linknet_resnet18"], [8, 1, 1, "", "linknet_resnet34"], [8, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[8, 1, 1, "", "from_hub"], [8, 1, 1, "", "login_to_hub"], [8, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[8, 1, 1, "", "crnn_mobilenet_v3_large"], [8, 1, 1, "", "crnn_mobilenet_v3_small"], [8, 1, 1, "", "crnn_vgg16_bn"], [8, 1, 1, "", "master"], [8, 1, 1, "", "parseq"], [8, 1, 1, "", "recognition_predictor"], [8, 1, 1, "", "sar_resnet31"], [8, 1, 1, "", "vitstr_base"], [8, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[9, 0, 1, "", "ChannelShuffle"], [9, 0, 1, "", "ColorInversion"], [9, 0, 1, "", "Compose"], [9, 0, 1, "", "GaussianBlur"], [9, 0, 1, "", "GaussianNoise"], [9, 0, 1, "", "LambdaTransformation"], [9, 0, 1, "", "Normalize"], [9, 0, 1, "", "OneOf"], [9, 0, 1, "", "RandomApply"], [9, 0, 1, "", "RandomBrightness"], [9, 0, 1, "", "RandomContrast"], [9, 0, 1, "", "RandomCrop"], [9, 0, 1, "", "RandomGamma"], [9, 0, 1, "", "RandomHorizontalFlip"], [9, 0, 1, "", "RandomHue"], [9, 0, 1, "", "RandomJpegQuality"], [9, 0, 1, "", "RandomResize"], [9, 0, 1, "", "RandomRotate"], [9, 0, 1, "", "RandomSaturation"], [9, 0, 1, "", "RandomShadow"], [9, 0, 1, "", "Resize"], [9, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[10, 0, 1, "", "DetectionMetric"], [10, 0, 1, "", "LocalizationConfusion"], [10, 0, 1, "", "OCRMetric"], [10, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.visualization": [[10, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [1, 7, 8, 10, 14, 17], "0": [1, 3, 6, 9, 10, 12, 15, 16, 18], "00": 18, "01": 18, "0123456789": 6, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "02562": 8, "03": 18, "035": 18, "0361328125": 18, "04": 18, "05": 18, "06": 18, "06640625": 18, "07": 18, "08": [9, 18], "09": 18, "0966796875": 18, "1": [6, 7, 8, 9, 10, 12, 16, 18], "10": [6, 10, 18], "100": [6, 9, 10, 16, 18], "1000": 18, "101": 6, "1024": [8, 12, 18], "104": 6, "106": 6, "108": 6, "1095": 16, "11": 18, "110": 10, "1107": 16, "114": 6, "115": 6, "1156": 16, "116": 6, "118": 6, "11800h": 18, "11th": 18, "12": 18, "120": 6, "123": 6, "126": 6, "1268": 16, "128": [8, 12, 17, 18], "13": 18, "130": 6, "13068": 16, "131": 6, "1337891": 16, "1357421875": 18, "1396484375": 18, "14": 18, "1420": 18, "14470v1": 6, "149": 16, "15": 18, "150": [10, 18], "1552": 18, "16": [8, 17, 18], "1630859375": 18, "1684": 18, "16x16": 8, "17": 18, "1778": 18, "1782": 18, "18": [8, 18], "185546875": 18, "1900": 18, "1910": 8, "19342": 16, "19370": 16, "195": 6, "19598": 16, "199": 18, "1999": 18, "2": [3, 4, 6, 7, 9, 15, 18], "20": 18, "200": 10, "2000": 16, "2003": [4, 6], "2012": 6, "2013": [4, 6], "2015": 6, "2019": 4, "207901": 16, "21": 18, "2103": 6, "2186": 16, "21888": 16, "22": 18, "224": [8, 9], "225": 9, "22672": 16, "229": [9, 16], "23": 18, "233": 16, "236": 6, "24": 18, "246": 16, "249": 16, "25": 18, "2504": 18, "255": [7, 8, 9, 10, 18], "256": 8, "257": 16, "26": 18, "26032": 16, "264": 12, "27": 18, "2700": 16, "2710": 18, "2749": 12, "28": 18, "287": 12, "29": 18, "296": 12, "299": 12, "2d": 18, "3": [3, 4, 7, 8, 9, 10, 17, 18], "30": 18, "300": 16, "3000": 16, "301": 12, "30595": 18, "30ghz": 18, "31": 8, "32": [6, 8, 9, 12, 16, 17, 18], "3232421875": 18, "33": [9, 18], "33402": 16, "33608": 16, "34": [8, 18], "340": 18, "3456": 18, "3515625": 18, "36": 18, "360": 16, "37": [6, 18], "38": 18, "39": 18, "4": [8, 9, 10, 18], "40": 18, "406": 9, "41": 18, "42": 18, "43": 18, "44": 18, "45": 18, "456": 9, "46": 18, "47": 18, "472": 16, "48": [6, 18], "485": 9, "49": 18, "49377": 16, "5": [6, 9, 10, 15, 18], "50": [8, 16, 18], "51": 18, "51171875": 18, "512": 8, "52": [6, 18], "529": 18, "53": 18, "54": 18, "540": 18, "5478515625": 18, "55": 18, "56": 18, "57": 18, "58": [6, 18], "580": 18, "5810546875": 18, "583": 18, "59": 18, "597": 18, "5k": [4, 6], "5m": 18, "6": [9, 18], "60": 9, "600": [8, 10, 18], "61": 18, "62": 18, "626": 16, "63": 18, "64": [8, 9, 18], "641": 18, "647": 16, "65": 18, "66": 18, "67": 18, "68": 18, "69": 18, "693": 12, "694": 12, "695": 12, "6m": 18, "7": 18, "70": [6, 10, 18], "707470": 16, "71": [6, 18], "7100000": 16, "7141797": 16, "7149": 16, "72": 18, "72dpi": 7, "73": 18, "73257": 16, "74": 18, "75": [9, 18], "7581382": 16, "76": 18, "77": 18, "772": 12, "772875": 16, "78": 18, "785": 12, "79": 18, "793533": 16, "796": 16, "798": 12, "7m": 18, "8": [8, 9, 18], "80": 18, "800": [8, 10, 16, 18], "81": 18, "82": 18, "83": 18, "84": 18, "849": 16, "85": 18, "8564453125": 18, "857": 18, "85875": 16, "86": 18, "8603515625": 18, "87": 18, "8707": 16, "88": 18, "89": 18, "9": [3, 9, 18], "90": 18, "90k": 6, "90kdict32px": 6, "91": 18, "914085328578949": 18, "92": 18, "93": 18, "94": [6, 18], "95": [10, 18], "9578408598899841": 18, "96": 18, "97": 18, "98": 18, "99": 18, "9949972033500671": 18, "A": [1, 2, 4, 6, 7, 8, 11, 17], "As": 2, "Be": 18, "Being": 1, "By": 13, "For": [1, 2, 3, 12, 18], "If": [2, 7, 8, 12, 18], "In": [2, 6, 16], "It": [9, 14, 15, 17], "Its": [4, 8], "No": [1, 18], "Of": 6, "Or": [15, 17], "The": [1, 2, 6, 7, 10, 13, 15, 16, 17, 18], "Then": 8, "To": [2, 3, 13, 14, 15, 17, 18], "_": [1, 6, 8], "__call__": 18, "_build": 2, "_i": 10, "ab": 6, "abc": 17, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "abdef": [6, 16], "abl": [16, 18], "about": [1, 16, 18], "abov": 18, "abstractdataset": 6, "abus": 1, "accept": 1, "access": [4, 7, 16, 18], "account": [1, 14], "accur": 18, "accuraci": 10, "achiev": 17, "act": 1, "action": 1, "activ": 4, "ad": [2, 8, 9], "adapt": 1, "add": [9, 10, 14, 18], "add_hook": 18, "add_label": 10, "addit": [2, 3, 7, 15, 18], "addition": [2, 18], "address": [1, 7], "adjust": 9, "advanc": 1, "advantag": 17, "advis": 2, "aesthet": [4, 6], "affect": 1, "after": [14, 18], "ag": 1, "again": 8, "aggreg": [10, 16], "aggress": 1, "align": [1, 7, 9], "all": [1, 2, 5, 6, 7, 9, 10, 15, 16, 18], "allow": [1, 17], "along": 18, "alreadi": [2, 17], "also": [1, 8, 14, 15, 16, 18], "alwai": 16, "an": [1, 2, 4, 6, 7, 8, 10, 15, 17, 18], "analysi": [7, 15], "ancient_greek": 6, "angl": [7, 9], "ani": [1, 6, 7, 8, 9, 10, 17, 18], "annot": 6, "anot": 16, "anoth": [8, 12, 16], "answer": 1, "anyascii": 10, "anyon": 4, "anyth": 15, "api": [2, 4], "apolog": 1, "apologi": 1, "app": 2, "appear": 1, "appli": [1, 6, 9], "applic": [4, 8], "appoint": 1, "appreci": 14, "appropri": [1, 2, 18], "ar": [1, 2, 3, 5, 6, 7, 9, 10, 11, 15, 16, 18], "arab": 6, "arabic_diacrit": 6, "arabic_lett": 6, "arabic_punctu": 6, "arbitrarili": [4, 8], "arch": [8, 14], "architectur": [4, 8, 14, 15], "area": 18, "argument": [6, 7, 8, 10, 12, 18], "around": 1, "arrai": [7, 9, 10], "art": [4, 15], "artefact": [10, 15, 18], "artefact_typ": 7, "artifici": [4, 6], "arxiv": [6, 8], "asarrai": 10, "ascii_lett": 6, "aspect": [4, 8, 9, 18], "assess": 10, "assign": 10, "associ": 7, "assum": 8, "assume_straight_pag": [8, 12, 18], "astyp": [8, 10, 18], "attack": 1, "attend": [4, 8], "attent": [1, 8], "autom": 4, "automat": 18, "autoregress": [4, 8], "avail": [1, 4, 5, 9], "averag": [9, 18], "avoid": [1, 3], "aw": [4, 18], "awar": 18, "azur": 18, "b": [8, 10, 18], "b_j": 10, "back": 2, "backbon": 8, "backend": 18, "background": 16, "bangla": 6, "bar": 15, "bar_cod": 16, "base": [4, 8, 15], "baselin": [4, 8, 18], "batch": [6, 8, 9, 15, 16, 18], "batch_siz": [6, 12, 15, 16, 17], "bblanchon": 3, "bbox": 18, "becaus": 13, "been": [2, 10, 16, 18], "befor": [6, 8, 9, 18], "begin": 10, "behavior": [1, 18], "being": [10, 18], "belong": 18, "benchmark": 18, "best": 1, "better": [11, 18], "between": [9, 10, 18], "bgr": 7, "bilinear": 9, "bin_thresh": 18, "binar": [4, 8, 18], "binari": [7, 17, 18], "bit": 17, "block": [10, 18], "block_1_1": 18, "blur": 9, "bmvc": 6, "bn": 14, "bodi": [1, 18], "bool": [6, 7, 8, 9, 10], "boolean": [8, 18], "both": [4, 6, 9, 16, 18], "bottom": [8, 18], "bound": [6, 7, 8, 9, 10, 15, 16, 18], "box": [6, 7, 8, 9, 10, 15, 16, 18], "box_thresh": 18, "bright": 9, "browser": [2, 4], "build": [2, 3, 17], "built": 2, "byte": [7, 18], "c": [3, 7, 10], "c_j": 10, "cach": [2, 6, 13], "cache_sampl": 6, "call": 17, "callabl": [6, 9], "can": [2, 3, 12, 13, 14, 15, 16, 18], "capabl": [2, 11, 18], "case": [6, 10], "cf": 18, "cfg": 18, "challeng": 6, "challenge2_test_task12_imag": 6, "challenge2_test_task1_gt": 6, "challenge2_training_task12_imag": 6, "challenge2_training_task1_gt": 6, "chang": [13, 18], "channel": [1, 2, 7, 9], "channel_prior": 3, "channelshuffl": 9, "charact": [4, 6, 7, 10, 16, 18], "charactergener": [6, 16], "characterist": 1, "charg": 18, "charset": 18, "chart": 7, "check": [2, 14, 18], "checkpoint": 8, "chip": 3, "ci": 2, "clarifi": 1, "clariti": 1, "class": [1, 6, 7, 9, 10, 18], "class_nam": 12, "classif": [16, 18], "classmethod": 7, "clear": 2, "clone": 3, "close": 2, "co": 14, "code": [4, 7, 15], "codecov": 2, "colab": 11, "collate_fn": 6, "collect": [7, 15], "color": 9, "colorinvers": 9, "column": 7, "com": [1, 3, 7, 8, 14], "combin": 18, "command": [2, 15], "comment": 1, "commit": 1, "common": [1, 9, 10, 17], "commun": 1, "compar": 4, "comparison": [10, 18], "competit": 6, "compil": [11, 18], "complaint": 1, "complementari": 10, "complet": 2, "compon": 18, "compos": [6, 18], "comprehens": 18, "comput": [6, 10, 17, 18], "conf_threshold": 15, "confid": [7, 18], "config": [3, 8], "configur": 8, "confus": 10, "consecut": [9, 18], "consequ": 1, "consid": [1, 2, 6, 7, 10, 18], "consist": 18, "consolid": [4, 6], "constant": 9, "construct": 1, "contact": 1, "contain": [5, 6, 11, 16, 18], "content": [6, 7, 18], "context": 8, "contib": 3, "continu": 1, "contrast": 9, "contrast_factor": 9, "contrib": [3, 15], "contribut": 1, "contributor": 2, "convers": 7, "convert": [7, 9], "convolut": 8, "coordin": [7, 18], "cord": [4, 6, 16, 18], "core": [10, 18], "corner": 18, "correct": 9, "correspond": [3, 7, 9, 18], "could": [1, 15], "counterpart": 10, "cover": 2, "coverag": 2, "cpu": [4, 12, 17], "creat": 14, "crnn": [4, 8, 14], "crnn_mobilenet_v3_larg": [8, 14, 18], "crnn_mobilenet_v3_smal": [8, 17, 18], "crnn_vgg16_bn": [8, 12, 14, 18], "crop": [7, 8, 9, 12, 16, 18], "crop_orient": [7, 18], "crop_orientation_predictor": [8, 12], "crop_param": 12, "cuda": 17, "currenc": 6, "current": [2, 12, 18], "custom": [14, 15, 17, 18], "custom_crop_orientation_model": 12, "custom_page_orientation_model": 12, "customhook": 18, "cvit": 4, "czczup": 8, "czech": 6, "d": [6, 16], "danish": 6, "data": [4, 6, 7, 9, 10, 12, 14], "dataload": 16, "dataset": [8, 12, 18], "dataset_info": 6, "date": [12, 18], "db": 14, "db_mobilenet_v3_larg": [8, 14, 18], "db_resnet34": 18, "db_resnet50": [8, 12, 14, 18], "dbnet": [4, 8], "deal": [11, 18], "decis": 1, "decod": 7, "decode_img_as_tensor": 7, "dedic": 17, "deem": 1, "deep": [8, 18], "def": 18, "default": [3, 7, 12, 13, 18], "defer": 16, "defin": [10, 17], "degre": [7, 9, 18], "degress": 7, "delet": 2, "delimit": 18, "delta": 9, "demo": [2, 4], "demonstr": 1, "depend": [2, 3, 4, 18], "deploi": 2, "deploy": 4, "derogatori": 1, "describ": 8, "descript": 11, "design": 9, "desir": 7, "det_arch": [8, 12, 14, 17], "det_b": 18, "det_model": [12, 14, 17], "det_param": 12, "det_predictor": [12, 18], "detail": [12, 18], "detect": [6, 7, 10, 11, 12, 15], "detect_languag": 8, "detect_orient": [8, 12, 18], "detection_predictor": [8, 18], "detection_task": [6, 16], "detectiondataset": [6, 16], "detectionmetr": 10, "detectionpredictor": [8, 12], "detector": [4, 8, 15], "deterior": 8, "determin": 1, "dev": [2, 13], "develop": 3, "deviat": 9, "devic": 17, "dict": [7, 10, 18], "dictionari": [7, 10], "differ": 1, "differenti": [4, 8], "digit": [4, 6, 16], "dimens": [7, 10, 18], "dimension": 9, "direct": 6, "directli": [14, 18], "directori": [2, 13], "disabl": [1, 13, 18], "disable_crop_orient": 18, "disable_page_orient": 18, "disclaim": 18, "discuss": 2, "disparag": 1, "displai": [7, 10], "display_artefact": 10, "distribut": 9, "div": 18, "divers": 1, "divid": 7, "do": [2, 3, 8], "doc": [2, 7, 15, 17, 18], "docartefact": [6, 16], "docstr": 2, "doctr": [3, 12, 13, 14, 15, 16, 17, 18], "doctr_cache_dir": 13, "doctr_multiprocessing_dis": 13, "document": [6, 8, 10, 11, 12, 15, 16, 17, 18], "documentbuild": 18, "documentfil": [7, 12, 14, 15, 17], "doesn": 17, "don": [12, 18], "done": 9, "download": [6, 16], "downsiz": 8, "draw": 9, "drop": 6, "drop_last": 6, "dtype": [7, 8, 9, 10, 17], "dual": [4, 6], "dummi": 14, "dummy_img": 18, "dummy_input": 17, "dure": 1, "dutch": 6, "dynam": [6, 15], "dynamic_seq_length": 6, "e": [1, 2, 3, 7, 8], "each": [4, 6, 7, 8, 9, 10, 16, 18], "eas": 2, "easi": [4, 10, 14, 17], "easili": [7, 10, 12, 14, 16, 18], "econom": 1, "edit": 1, "educ": 1, "effect": 18, "effici": [2, 4, 6, 8], "either": [10, 18], "element": [6, 7, 8, 18], "els": [2, 15], "email": 1, "empathi": 1, "en": 18, "enabl": [6, 7], "enclos": 7, "encod": [4, 6, 7, 8, 18], "encode_sequ": 6, "encount": 2, "encrypt": 7, "end": [4, 6, 8, 10], "english": [6, 16], "enough": [2, 18], "ensur": 2, "entri": 6, "environ": [1, 13], "eo": 6, "equiv": 18, "estim": 8, "etc": [7, 15], "ethnic": 1, "evalu": [16, 18], "event": 1, "everyon": 1, "everyth": [2, 18], "exact": [10, 18], "exampl": [1, 2, 4, 6, 8, 14, 18], "exchang": 17, "execut": 18, "exist": 14, "expand": 9, "expect": [7, 9, 10], "experi": 1, "explan": [1, 18], "explicit": 1, "exploit": [4, 8], "export": [7, 8, 10, 11, 15, 18], "export_as_straight_box": [8, 18], "export_as_xml": 18, "export_model_to_onnx": 17, "express": [1, 9], "extens": 7, "extern": [1, 16], "extract": [4, 6], "extractor": 8, "f_": 10, "f_a": 10, "factor": 9, "fair": 1, "fairli": 1, "fals": [6, 7, 8, 9, 10, 12, 18], "faq": 1, "fascan": 14, "fast": [4, 6, 8], "fast_bas": [8, 18], "fast_smal": [8, 18], "fast_tini": [8, 18], "faster": [4, 8, 17], "fasterrcnn_mobilenet_v3_large_fpn": 8, "favorit": 18, "featur": [3, 8, 10, 11, 12, 15], "feedback": 1, "feel": [2, 14], "felix92": 14, "few": [17, 18], "figsiz": 10, "figur": [10, 15], "file": [2, 6], "final": 8, "find": [2, 16], "finnish": 6, "first": [2, 6], "firsthand": 6, "fit": [8, 18], "flag": 18, "flip": 9, "float": [7, 9, 10, 17], "float32": [7, 8, 9, 17], "fn": 9, "focu": 14, "focus": [1, 6], "folder": 6, "follow": [1, 2, 3, 6, 9, 10, 12, 13, 14, 15, 18], "font": 6, "font_famili": 6, "foral": 10, "forc": 2, "forg": 3, "form": [4, 6, 18], "format": [7, 10, 12, 16, 17, 18], "forpost": [4, 6], "forum": 2, "fp16": 17, "frac": 10, "framework": [3, 14, 16, 18], "free": [1, 2, 14], "french": [6, 12, 14, 18], "friendli": 4, "from": [1, 4, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18], "from_hub": [8, 14], "from_imag": [7, 14, 15, 17], "from_pdf": 7, "from_url": 7, "full": [6, 10, 18], "function": [6, 9, 10, 15], "funsd": [4, 6, 16, 18], "further": 16, "futur": 6, "g": [7, 8], "g_": 10, "g_x": 10, "gamma": 9, "gaussian": 9, "gaussianblur": 9, "gaussiannois": 9, "gen": 18, "gender": 1, "gener": [2, 4, 7, 8], "generic_cyrillic_lett": 6, "geometri": [4, 7, 18], "geq": 10, "german": [6, 12, 14], "get": [17, 18], "git": 14, "github": [2, 3, 8, 14], "give": [1, 15], "given": [6, 7, 9, 10, 18], "global": 8, "go": 18, "good": 17, "googl": 2, "googlevis": 4, "gpu": [4, 15, 17], "gracefulli": 1, "graph": [4, 6, 7], "grayscal": 9, "ground": 10, "groung": 10, "group": [4, 18], "gt": 10, "gt_box": 10, "gt_label": 10, "guid": 2, "guidanc": 16, "gvision": 18, "h": [7, 8, 9], "h_": 10, "ha": [2, 6, 10, 16], "handl": [11, 16, 18], "handwrit": 6, "handwritten": 16, "harass": 1, "hardwar": 18, "harm": 1, "hat": 10, "have": [1, 2, 10, 12, 14, 16, 17, 18], "head": [8, 18], "healthi": 1, "hebrew": 6, "height": [7, 9], "hello": [10, 18], "help": 17, "here": [5, 9, 11, 15, 16, 18], "hf": 8, "hf_hub_download": 8, "high": 7, "higher": [3, 6, 18], "hindi": 6, "hindi_digit": 6, "hocr": 18, "hook": 18, "horizont": [7, 9, 18], "hous": 6, "how": [2, 11, 12, 14, 16], "howev": 16, "hsv": 9, "html": [1, 2, 3, 7, 18], "http": [1, 3, 6, 7, 8, 14, 18], "hub": 8, "hue": 9, "huggingfac": 8, "hw": 6, "i": [1, 2, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17], "i7": 18, "ic03": [4, 6, 16], "ic13": [4, 6, 16], "icdar": [4, 6], "icdar2019": 6, "id": 18, "ident": 1, "identifi": 4, "iiit": [4, 6], "iiit5k": [6, 16], "iiithw": [4, 6, 16], "imag": [4, 6, 7, 8, 9, 10, 14, 15, 16, 18], "imagenet": 8, "imageri": 1, "images_90k_norm": 6, "img": [6, 9, 16, 17], "img_cont": 7, "img_fold": [6, 16], "img_path": 7, "img_transform": 6, "imgur5k": [4, 6, 16], "imgur5k_annot": 6, "imlist": 6, "impact": 1, "implement": [6, 7, 8, 9, 10, 18], "import": [6, 7, 8, 9, 10, 12, 14, 15, 16, 17, 18], "improv": 8, "inappropri": 1, "incid": 1, "includ": [1, 6, 16, 17], "inclus": 1, "increas": 9, "independ": 9, "index": [2, 7], "indic": 10, "individu": 1, "infer": [4, 8, 9, 15, 18], "inform": [1, 2, 4, 6, 16], "input": [2, 7, 8, 9, 17, 18], "input_crop": 8, "input_pag": [8, 10, 18], "input_shap": 17, "input_tensor": 8, "inspir": [1, 9], "instal": [14, 15, 17], "instanc": [1, 18], "instanti": [8, 18], "instead": [6, 7, 8], "insult": 1, "int": [6, 7, 9], "int64": 10, "integ": 10, "integr": [4, 14, 16], "intel": 18, "interact": [1, 7, 10], "interfac": [14, 17], "interoper": 17, "interpol": 9, "interpret": [6, 7], "intersect": 10, "invert": 9, "investig": 1, "invis": 1, "involv": [1, 18], "io": [12, 14, 15, 17], "iou": 10, "iou_thresh": 10, "iou_threshold": 15, "irregular": [4, 8, 16], "isn": 6, "issu": [1, 2, 14], "italian": 6, "iter": [6, 9, 16, 18], "its": [7, 8, 9, 10, 16, 18], "itself": [8, 14], "j": 10, "job": 2, "join": 2, "jpeg": 9, "jpegqual": 9, "jpg": [6, 7, 14, 17], "json": [6, 16, 18], "json_output": 18, "jump": 2, "just": 1, "kei": [4, 6], "kera": [8, 17], "kernel": [4, 8, 9], "kernel_shap": 9, "keywoard": 8, "keyword": [6, 7, 8, 10], "kie": [8, 12], "kie_predictor": [8, 12], "kiepredictor": 8, "kind": 1, "know": [2, 17], "kwarg": [6, 7, 8, 10], "l": 10, "l_j": 10, "label": [6, 10, 15, 16], "label_fil": [6, 16], "label_fold": 6, "label_path": [6, 16], "labels_path": [6, 16], "ladder": 1, "lambda": 9, "lambdatransform": 9, "lang": 18, "languag": [1, 4, 6, 7, 8, 14, 18], "larg": [8, 14], "largest": 10, "last": [3, 6], "latenc": 8, "later": 2, "latest": 18, "latin": 6, "layer": 17, "layout": 18, "lead": 1, "leader": 1, "learn": [1, 4, 8, 17, 18], "least": 3, "left": [10, 18], "legacy_french": 6, "length": [6, 18], "less": [17, 18], "level": [1, 6, 10, 18], "leverag": 11, "lf": 14, "librari": [2, 3, 11, 12], "light": 4, "lightweight": 17, "like": 1, "limits_": 10, "line": [4, 8, 10, 18], "line_1_1": 18, "link": 12, "linknet": [4, 8], "linknet_resnet18": [8, 12, 17, 18], "linknet_resnet34": [8, 17, 18], "linknet_resnet50": [8, 18], "list": [6, 7, 9, 10, 14], "ll": 10, "load": [4, 6, 8, 15, 17], "load_state_dict": 12, "load_weight": 12, "loc_pr": 18, "local": [2, 4, 6, 8, 10, 16, 18], "localis": 6, "localizationconfus": 10, "locat": [2, 7, 18], "login": 8, "login_to_hub": [8, 14], "logo": [7, 15, 16], "love": 14, "lower": [9, 10, 18], "m": [2, 10, 18], "m1": 3, "macbook": 3, "machin": 17, "made": 4, "magc_resnet31": 8, "mai": [1, 2], "mail": 1, "main": 11, "maintain": 4, "mainten": 2, "make": [1, 2, 10, 12, 13, 14, 17, 18], "mani": [16, 18], "manipul": 18, "map": [6, 8], "map_loc": 12, "master": [4, 8, 18], "match": [10, 18], "mathcal": 10, "matplotlib": [7, 10], "max": [6, 9, 10], "max_angl": 9, "max_area": 9, "max_char": [6, 16], "max_delta": 9, "max_gain": 9, "max_gamma": 9, "max_qual": 9, "max_ratio": 9, "maximum": [6, 9], "maxval": [8, 9], "mbox": 10, "mean": [9, 10, 12], "meaniou": 10, "meant": [7, 17], "measur": 18, "media": 1, "median": 8, "meet": 12, "member": 1, "memori": [13, 17], "mention": 18, "merg": 6, "messag": 2, "meta": 18, "metadata": 17, "metal": 3, "method": [7, 9, 18], "metric": [10, 18], "middl": 18, "might": [17, 18], "min": 9, "min_area": 9, "min_char": [6, 16], "min_gain": 9, "min_gamma": 9, "min_qual": 9, "min_ratio": 9, "min_val": 9, "minde": [1, 3, 4, 8], "minim": [2, 4], "minimalist": [4, 8], "minimum": [3, 6, 9, 10, 18], "minval": 9, "miss": 3, "mistak": 1, "mixed_float16": 17, "mixed_precis": 17, "mjsynth": [4, 6, 16], "mnt": 6, "mobilenet": [8, 14], "mobilenet_v3_larg": 8, "mobilenet_v3_large_r": 8, "mobilenet_v3_smal": [8, 12], "mobilenet_v3_small_crop_orient": [8, 12], "mobilenet_v3_small_page_orient": [8, 12], "mobilenet_v3_small_r": 8, "mobilenetv3": 8, "modal": [4, 6], "mode": 3, "model": [6, 10, 13, 15, 16], "model_nam": [8, 14, 17], "model_path": [15, 17], "moder": 1, "modif": 2, "modifi": [8, 13, 18], "modul": [3, 7, 8, 9, 10, 18], "more": [2, 16, 18], "most": 18, "mozilla": 1, "multi": [4, 8], "multilingu": [6, 14], "multipl": [6, 7, 9, 18], "multipli": 9, "multiprocess": 13, "my": 8, "my_awesome_model": 14, "my_hook": 18, "n": [6, 10], "name": [6, 8, 17, 18], "nation": 1, "natur": [1, 4, 6], "ndarrai": [6, 7, 9, 10], "necessari": [3, 12, 13], "need": [2, 3, 6, 10, 12, 13, 14, 15, 18], "neg": 9, "nest": 18, "network": [4, 6, 8, 17], "neural": [4, 6, 8, 17], "new": [2, 10], "next": [6, 16], "nois": 9, "noisi": [4, 6], "non": [4, 6, 7, 8, 9, 10], "none": [6, 7, 8, 9, 10, 18], "normal": [8, 9], "norwegian": 6, "note": [0, 2, 6, 8, 12, 14, 15, 17], "now": 2, "np": [8, 9, 10, 18], "num_output_channel": 9, "num_sampl": [6, 16], "number": [6, 9, 10, 18], "numpi": [7, 8, 10, 18], "o": 3, "obb": 15, "obj_detect": 14, "object": [6, 7, 10, 15, 18], "objectness_scor": [7, 18], "oblig": 1, "obtain": 18, "occupi": 17, "ocr": [4, 6, 8, 10, 14], "ocr_carea": 18, "ocr_db_crnn": 10, "ocr_lin": 18, "ocr_pag": 18, "ocr_par": 18, "ocr_predictor": [8, 12, 14, 17, 18], "ocrdataset": [6, 16], "ocrmetr": 10, "ocrpredictor": [8, 12], "ocrx_word": 18, "offens": 1, "offici": [1, 8], "offlin": 1, "offset": 9, "onc": 18, "one": [2, 6, 8, 9, 12, 14, 18], "oneof": 9, "ones": [6, 10], "onli": [2, 8, 9, 10, 12, 14, 16, 17, 18], "onlin": 1, "onnx": 15, "onnxruntim": [15, 17], "onnxtr": 17, "opac": 9, "opacity_rang": 9, "open": [1, 2, 14, 17], "opinion": 1, "optic": [4, 18], "optim": [4, 18], "option": [6, 8, 12], "order": [2, 6, 7, 9], "org": [1, 6, 8, 18], "organ": 7, "orient": [1, 7, 8, 11, 15, 18], "orientationpredictor": 8, "other": [1, 2], "otherwis": [1, 7, 10], "our": [2, 8, 18], "out": [2, 8, 9, 10, 18], "outpout": 18, "output": [7, 9, 17], "output_s": [7, 9], "outsid": 13, "over": [6, 10, 18], "overal": [1, 8], "overlai": 7, "overview": 15, "overwrit": 12, "overwritten": 14, "own": 4, "p": [9, 18], "packag": [2, 4, 10, 13, 15, 16, 17], "pad": [6, 8, 9, 18], "page": [3, 6, 8, 10, 12, 18], "page1": 7, "page2": 7, "page_1": 18, "page_idx": [7, 18], "page_orientation_predictor": [8, 12], "page_param": 12, "pair": 10, "paper": 8, "par_1_1": 18, "paragraph": 18, "paragraph_break": 18, "param": [9, 18], "paramet": [4, 7, 8, 17], "pars": [4, 6], "parseq": [4, 8, 14, 17, 18], "part": [6, 9, 18], "parti": 3, "partial": 18, "particip": 1, "pass": [6, 7, 8, 12, 18], "password": 7, "patch": [8, 10], "path": [6, 7, 15, 16, 17], "path_to_checkpoint": 12, "path_to_custom_model": 17, "path_to_pt": 12, "pattern": 1, "pdf": [7, 8, 11], "pdfpage": 7, "peopl": 1, "per": [9, 18], "perform": [4, 7, 8, 9, 10, 13, 17, 18], "period": 1, "permiss": 1, "permut": [4, 8], "persian_lett": 6, "person": [1, 16], "phase": 18, "photo": 16, "physic": [1, 7], "pick": 9, "pictur": 7, "pip": [2, 3, 15, 17], "pipelin": 18, "pixel": [7, 9, 18], "pleas": 2, "plot": 10, "plt": 10, "plug": 14, "plugin": 3, "png": 7, "point": 17, "polici": 13, "polish": 6, "polit": 1, "polygon": [6, 10, 18], "pool": 8, "portugues": 6, "posit": [1, 10], "possibl": [2, 10, 14, 18], "post": [1, 18], "postprocessor": 18, "potenti": 8, "power": 4, "ppageno": 18, "pre": [2, 8, 17], "precis": [10, 18], "pred": 10, "pred_box": 10, "pred_label": 10, "predefin": 16, "predict": [7, 8, 10, 18], "predictor": [4, 7, 8, 11, 12, 14, 17], "prefer": 16, "preinstal": 3, "preprocessor": [12, 18], "prerequisit": 14, "present": 11, "preserv": [8, 9, 18], "preserve_aspect_ratio": [7, 8, 9, 12, 18], "pretrain": [4, 8, 10, 12, 17, 18], "pretrained_backbon": [8, 12], "print": 18, "prior": 6, "privaci": 1, "privat": 1, "probabl": 9, "problem": 2, "procedur": 9, "process": [2, 4, 7, 12, 18], "processor": 18, "produc": [11, 18], "product": 17, "profession": 1, "project": [2, 16], "promptli": 1, "proper": 2, "properli": 6, "provid": [1, 2, 4, 14, 15, 16, 18], "public": [1, 4], "publicli": 18, "publish": 1, "pull": 14, "punctuat": 6, "pure": 6, "purpos": 2, "push_to_hf_hub": [8, 14], "py": 14, "pypdfium2": [3, 7], "pyplot": [7, 10], "python": [2, 15], "python3": 14, "pytorch": [3, 4, 8, 9, 12, 14, 17, 18], "q": 2, "qr": [7, 15], "qr_code": 16, "qualiti": 9, "question": 1, "quickli": 4, "quicktour": 11, "r": 18, "race": 1, "ramdisk": 6, "rand": [8, 9, 10, 17, 18], "random": [8, 9, 10, 18], "randomappli": 9, "randombright": 9, "randomcontrast": 9, "randomcrop": 9, "randomgamma": 9, "randomhorizontalflip": 9, "randomhu": 9, "randomjpegqu": 9, "randomli": 9, "randomres": 9, "randomrot": 9, "randomsatur": 9, "randomshadow": 9, "rang": 9, "rassi": 14, "ratio": [8, 9, 18], "raw": [7, 10], "re": 17, "read": [4, 6, 8], "read_html": 7, "read_img_as_numpi": 7, "read_img_as_tensor": 7, "read_pdf": 7, "readi": 17, "real": [4, 8, 9], "reason": [1, 4, 6], "rebuild": 2, "rebuilt": 2, "recal": [10, 18], "receipt": [4, 6, 18], "reco_arch": [8, 12, 14, 17], "reco_b": 18, "reco_model": [12, 14, 17], "reco_param": 12, "reco_predictor": 12, "recogn": 18, "recognit": [6, 10, 11, 12], "recognition_predictor": [8, 18], "recognition_task": [6, 16], "recognitiondataset": [6, 16], "recognitionpredictor": [8, 12], "rectangular": 8, "reduc": [3, 9], "refer": [2, 3, 12, 14, 15, 16, 18], "regardless": 1, "region": 18, "regroup": 10, "regular": 16, "reject": 1, "rel": [7, 9, 10, 18], "relat": 7, "releas": [0, 3], "relev": 15, "religion": 1, "remov": 1, "render": [7, 18], "repo": 8, "repo_id": [8, 14], "report": 1, "repositori": [6, 8, 14], "repres": [1, 17, 18], "represent": [4, 8], "request": [1, 14], "requir": [3, 9, 17], "research": 4, "residu": 8, "resiz": [9, 18], "resnet": 8, "resnet18": [8, 14], "resnet31": 8, "resnet34": 8, "resnet50": [8, 14], "resolv": 7, "resolve_block": 18, "resolve_lin": 18, "resourc": 16, "respect": 1, "rest": [2, 9, 10], "restrict": 13, "result": [2, 6, 7, 11, 14, 17, 18], "return": 18, "reusabl": 18, "review": 1, "rgb": [7, 9], "rgb_mode": 7, "rgb_output": 7, "right": [1, 8, 10], "robust": [4, 6], "root": 6, "rotat": [6, 7, 8, 9, 10, 11, 12, 16, 18], "run": [2, 3, 8], "same": [2, 7, 10, 16, 17, 18], "sampl": [6, 16, 18], "sample_transform": 6, "sar": [4, 8], "sar_resnet31": [8, 18], "satur": 9, "save": [8, 16], "scale": [7, 8, 9, 10], "scale_rang": 9, "scan": [4, 6], "scene": [4, 6, 8], "score": [7, 10], "script": [2, 16], "seamless": 4, "seamlessli": [4, 18], "search": 8, "searchabl": 11, "sec": 18, "second": 18, "section": [12, 14, 15, 17, 18], "secur": [1, 13], "see": [1, 2], "seen": 18, "segment": [4, 8, 18], "self": 18, "semant": [4, 8], "send": 18, "sens": 10, "sensit": 16, "separ": 18, "sequenc": [4, 6, 7, 8, 10, 18], "sequenti": [9, 18], "seri": 1, "seriou": 1, "set": [1, 3, 6, 8, 10, 13, 15, 18], "set_global_polici": 17, "sever": [7, 9, 18], "sex": 1, "sexual": 1, "shade": 9, "shape": [4, 7, 8, 9, 10, 18], "share": [13, 16], "shift": 9, "shm": 13, "should": [2, 6, 7, 9, 10], "show": [4, 7, 8, 10, 12, 14, 15], "showcas": [2, 11], "shuffl": [6, 9], "side": 10, "signatur": 7, "signific": 16, "simpl": [4, 8, 17], "simpler": 8, "sinc": [6, 16], "singl": [1, 2, 4, 6], "single_img_doc": 17, "size": [1, 6, 7, 9, 15, 18], "skew": 18, "slack": 2, "slightli": 8, "small": [2, 8, 18], "smallest": 7, "snapshot_download": 8, "snippet": 18, "so": [2, 3, 6, 8, 14, 16], "social": 1, "socio": 1, "some": [3, 11, 14, 16], "someth": 2, "somewher": 2, "sort": 1, "sourc": [6, 7, 8, 9, 10, 14], "space": [1, 18], "span": 18, "spanish": 6, "spatial": [4, 6, 7], "specif": [2, 3, 10, 12, 16, 18], "specifi": [1, 6, 7], "speed": [4, 8, 18], "sphinx": 2, "sroie": [4, 6, 16], "stabl": 3, "stackoverflow": 2, "stage": 4, "standalon": 11, "standard": 9, "start": 6, "state": [4, 10, 15], "static": 10, "statu": 1, "std": [9, 12], "step": 13, "still": 18, "str": [6, 7, 8, 9, 10], "straight": [6, 8, 16, 18], "straighten": 18, "straighten_pag": [8, 12, 18], "straigten_pag": 12, "stream": 7, "street": [4, 6], "strict": 3, "strictli": 10, "string": [6, 7, 10, 18], "strive": 3, "strong": [4, 8], "structur": [17, 18], "subset": [6, 18], "suggest": [2, 14], "sum": 10, "summari": 10, "support": [3, 12, 15, 17, 18], "sustain": 1, "svhn": [4, 6, 16], "svt": [6, 16], "swedish": 6, "symmetr": [8, 9, 18], "symmetric_pad": [8, 9, 18], "synthet": 4, "synthtext": [4, 6, 16], "system": 18, "t": [2, 6, 12, 17, 18], "tabl": [14, 15, 16], "take": [1, 6, 18], "target": [6, 7, 9, 10, 16], "target_s": 6, "task": [4, 6, 8, 14, 16, 18], "task2": 6, "team": 3, "techminde": 3, "templat": [2, 4], "tensor": [6, 7, 9, 18], "tensorflow": [3, 4, 7, 8, 9, 12, 14, 17, 18], "tensorspec": 17, "term": 1, "test": [6, 16], "test_set": 6, "text": [6, 7, 8, 10, 16], "text_output": 18, "textmatch": 10, "textnet": 8, "textnet_bas": 8, "textnet_smal": 8, "textnet_tini": 8, "textract": [4, 18], "textstylebrush": [4, 6], "textual": [4, 6, 7, 8, 18], "tf": [3, 7, 8, 9, 14, 17], "than": [2, 10, 14], "thank": 2, "thei": [1, 10], "them": [6, 18], "thi": [1, 2, 3, 5, 6, 9, 10, 12, 13, 14, 16, 17, 18], "thing": [17, 18], "third": 3, "those": [1, 7, 18], "threaten": 1, "threshold": 18, "through": [1, 9, 15, 16], "tilman": 14, "time": [1, 4, 8, 10, 16], "tini": 8, "titl": [7, 18], "tm": 18, "tmp": 13, "togeth": [2, 7], "tograi": 9, "tool": 16, "top": [10, 17, 18], "topic": 2, "torch": [3, 9, 12, 14, 17], "torchvis": 9, "total": 12, "toward": [1, 3], "train": [2, 6, 8, 9, 14, 15, 16, 17, 18], "train_it": [6, 16], "train_load": [6, 16], "train_pytorch": 14, "train_set": [6, 16], "train_tensorflow": 14, "trainabl": [4, 8], "tranform": 9, "transcrib": 18, "transfer": [4, 6], "transfo": 9, "transform": [4, 6, 8], "translat": 1, "troll": 1, "true": [6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18], "truth": 10, "tune": 17, "tupl": [6, 7, 9, 10], "two": [7, 13], "txt": 6, "type": [7, 10, 14, 17, 18], "typic": 18, "u": [1, 2], "ucsd": 6, "udac": 2, "uint8": [7, 8, 10, 18], "ukrainian": 6, "unaccept": 1, "underli": [16, 18], "underneath": 7, "understand": [4, 6, 18], "uniform": [8, 9], "uniformli": 9, "uninterrupt": [7, 18], "union": 10, "unittest": 2, "unlock": 7, "unoffici": 8, "unprofession": 1, "unsolicit": 1, "unsupervis": 4, "unwelcom": 1, "up": [8, 18], "updat": 10, "upgrad": 2, "upper": [6, 9], "uppercas": 16, "url": 7, "us": [1, 2, 3, 6, 8, 10, 11, 12, 13, 14, 15, 18], "usabl": 18, "usag": [13, 17], "use_polygon": [6, 10, 16], "useabl": 18, "user": [4, 7, 11], "utf": 18, "util": 17, "v1": 14, "v3": [8, 14, 18], "valid": 16, "valu": [2, 7, 9, 18], "valuabl": 4, "variabl": 13, "varieti": 6, "veri": 8, "version": [1, 2, 3, 17, 18], "vgg": 8, "vgg16": 14, "vgg16_bn_r": 8, "via": 1, "vietnames": 6, "view": [4, 6], "viewpoint": 1, "violat": 1, "visibl": 1, "vision": [4, 6, 8], "visiondataset": 6, "visiontransform": 8, "visual": [3, 4, 15], "visualize_pag": 10, "vit_": 8, "vit_b": 8, "vitstr": [4, 8, 17], "vitstr_bas": [8, 18], "vitstr_smal": [8, 12, 17, 18], "viz": 3, "vocab": [12, 14, 16, 17, 18], "vocabulari": [6, 12, 14], "w": [7, 8, 9, 10], "w3": 18, "wa": 1, "wai": [1, 4, 16], "want": [2, 17, 18], "warmup": 18, "wasn": 2, "we": [1, 2, 3, 4, 7, 9, 12, 14, 16, 17, 18], "weasyprint": 7, "web": [2, 7], "websit": 6, "welcom": 1, "well": [1, 17], "were": [1, 7, 18], "what": 1, "when": [1, 2, 8], "whenev": 2, "where": [2, 7, 9, 10], "whether": [2, 6, 7, 9, 10, 16, 18], "which": [1, 8, 13, 15, 16, 18], "whichev": 3, "while": [9, 18], "why": 1, "width": [7, 9], "wiki": 1, "wildreceipt": [4, 6, 16], "window": [8, 10], "wish": 2, "within": 1, "without": [1, 6, 8], "wonder": 2, "word": [4, 6, 8, 10, 18], "word_1_1": 18, "word_1_2": 18, "word_1_3": 18, "wordgener": [6, 16], "words_onli": 10, "work": [12, 13, 18], "workflow": 2, "worklow": 2, "world": [10, 18], "worth": 8, "wrap": 18, "wrapper": [6, 9], "write": 13, "written": [1, 7], "www": [1, 7, 18], "x": [7, 9, 10], "x_ascend": 18, "x_descend": 18, "x_i": 10, "x_size": 18, "x_wconf": 18, "xhtml": 18, "xmax": 7, "xmin": 7, "xml": 18, "xml_bytes_str": 18, "xml_element": 18, "xml_output": 18, "xmln": 18, "y": 10, "y_i": 10, "y_j": 10, "yet": 15, "ymax": 7, "ymin": 7, "yolov8": 15, "you": [2, 3, 6, 7, 8, 12, 13, 14, 15, 16, 17, 18], "your": [2, 4, 7, 10, 18], "yoursit": 7, "zero": [9, 10], "zoo": 12, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 6, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 6, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 6, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 6, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 6, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 6, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 6, "\u00e4\u00f6\u00e4\u00f6": 6, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 6, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 6, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 6, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 6, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 6, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 6, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0905": 6, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 6, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 6, "\u0950": 6, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 6, "\u09bd": 6, "\u09ce": 6, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 6}, "titles": ["Changelog", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 2, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 1], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 1], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 1], "31": 0, "4": [0, 1], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 18, "approach": 18, "architectur": 18, "arg": [6, 7, 8, 9, 10], "artefact": 7, "artefactdetect": 15, "attribut": 1, "avail": [15, 16, 18], "aw": 13, "ban": 1, "block": 7, "bug": 2, "changelog": 0, "choos": [16, 18], "classif": [8, 12, 14], "code": [1, 2], "codebas": 2, "commit": 2, "commun": 14, "compos": 9, "conda": 3, "conduct": 1, "connect": 2, "continu": 2, "contrib": 5, "contribut": [2, 5, 15], "contributor": 1, "convent": 14, "correct": 1, "coven": 1, "custom": [6, 12], "data": 16, "dataload": 6, "dataset": [4, 6, 16], "detect": [4, 8, 14, 16, 18], "develop": 2, "do": 18, "doctr": [2, 4, 5, 6, 7, 8, 9, 10, 11], "document": [2, 4, 7], "end": 18, "enforc": 1, "evalu": 10, "export": 17, "factori": 8, "featur": [2, 4], "feedback": 2, "file": 7, "from": 14, "gener": [6, 16], "git": 3, "guidelin": 1, "half": 17, "hub": 14, "huggingfac": 14, "i": 18, "infer": 17, "instal": [2, 3], "integr": [2, 15], "io": 7, "lambda": 13, "let": 2, "line": 7, "linux": 3, "load": [12, 14, 16], "loader": 6, "main": 4, "mode": 2, "model": [4, 8, 12, 14, 17, 18], "modifi": 2, "modul": [5, 15], "name": 14, "notebook": 11, "object": 16, "ocr": [16, 18], "onli": 3, "onnx": 17, "optim": 17, "option": 18, "orient": 12, "our": 1, "output": 18, "own": [12, 16], "packag": 3, "page": 7, "perman": 1, "pipelin": 15, "pledg": 1, "precis": 17, "predictor": 18, "prepar": 17, "prerequisit": 3, "pretrain": 14, "push": 14, "python": 3, "qualiti": 2, "question": 2, "read": 7, "readi": 16, "recognit": [4, 8, 14, 16, 18], "report": 2, "request": 2, "respons": 1, "return": [6, 7, 8, 10], "right": 18, "scope": 1, "share": 14, "should": 18, "stage": 18, "standard": 1, "structur": [2, 7], "style": 2, "support": [4, 5, 6, 9], "synthet": [6, 16], "task": 10, "temporari": 1, "test": 2, "text": [4, 18], "train": 12, "transform": 9, "two": 18, "unit": 2, "us": [16, 17], "util": 10, "v0": 0, "verif": 2, "via": 3, "visual": 10, "vocab": 6, "warn": 1, "what": 18, "word": 7, "your": [12, 14, 15, 16, 17], "zoo": [4, 8]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[1, "correction"]], "2. Warning": [[1, "warning"]], "3. Temporary Ban": [[1, "temporary-ban"]], "4. Permanent Ban": [[1, "permanent-ban"]], "AWS Lambda": [[13, null]], "Advanced options": [[18, "advanced-options"]], "Args:": [[6, "args"], [6, "id4"], [6, "id7"], [6, "id10"], [6, "id13"], [6, "id16"], [6, "id19"], [6, "id22"], [6, "id25"], [6, "id29"], [6, "id32"], [6, "id37"], [6, "id40"], [6, "id46"], [6, "id49"], [6, "id50"], [6, "id51"], [6, "id54"], [6, "id57"], [6, "id60"], [6, "id61"], [7, "args"], [7, "id2"], [7, "id3"], [7, "id4"], [7, "id5"], [7, "id6"], [7, "id7"], [7, "id10"], [7, "id12"], [7, "id14"], [7, "id16"], [7, "id20"], [7, "id24"], [7, "id28"], [8, "args"], [8, "id3"], [8, "id8"], [8, "id13"], [8, "id17"], [8, "id21"], [8, "id26"], [8, "id31"], [8, "id36"], [8, "id41"], [8, "id46"], [8, "id50"], [8, "id54"], [8, "id59"], [8, "id63"], [8, "id68"], [8, "id73"], [8, "id77"], [8, "id81"], [8, "id85"], [8, "id90"], [8, "id95"], [8, "id99"], [8, "id104"], [8, "id109"], [8, "id114"], [8, "id119"], [8, "id123"], [8, "id127"], [8, "id132"], [8, "id137"], [8, "id142"], [8, "id146"], [8, "id150"], [8, "id155"], [8, "id159"], [8, "id163"], [8, "id167"], [8, "id169"], [8, "id171"], [8, "id173"], [9, "args"], [9, "id1"], [9, "id2"], [9, "id3"], [9, "id4"], [9, "id5"], [9, "id6"], [9, "id7"], [9, "id8"], [9, "id9"], [9, "id10"], [9, "id11"], [9, "id12"], [9, "id13"], [9, "id14"], [9, "id15"], [9, "id16"], [9, "id17"], [9, "id18"], [9, "id19"], [10, "args"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"]], "Artefact": [[7, "artefact"]], "ArtefactDetection": [[15, "artefactdetection"]], "Attribution": [[1, "attribution"]], "Available Datasets": [[16, "available-datasets"]], "Available architectures": [[18, "available-architectures"], [18, "id1"], [18, "id2"]], "Available contribution modules": [[15, "available-contribution-modules"]], "Block": [[7, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[16, null]], "Choosing the right model": [[18, null]], "Classification": [[14, "classification"]], "Code quality": [[2, "code-quality"]], "Code style verification": [[2, "code-style-verification"]], "Codebase structure": [[2, "codebase-structure"]], "Commits": [[2, "commits"]], "Composing transformations": [[9, "composing-transformations"]], "Continuous Integration": [[2, "continuous-integration"]], "Contributing to docTR": [[2, null]], "Contributor Covenant Code of Conduct": [[1, null]], "Custom dataset loader": [[6, "custom-dataset-loader"]], "Custom orientation classification models": [[12, "custom-orientation-classification-models"]], "Data Loading": [[16, "data-loading"]], "Dataloader": [[6, "dataloader"]], "Detection": [[14, "detection"], [16, "detection"]], "Detection predictors": [[18, "detection-predictors"]], "Developer mode installation": [[2, "developer-mode-installation"]], "Developing docTR": [[2, "developing-doctr"]], "Document": [[7, "document"]], "Document structure": [[7, "document-structure"]], "End-to-End OCR": [[18, "end-to-end-ocr"]], "Enforcement": [[1, "enforcement"]], "Enforcement Guidelines": [[1, "enforcement-guidelines"]], "Enforcement Responsibilities": [[1, "enforcement-responsibilities"]], "Export to ONNX": [[17, "export-to-onnx"]], "Feature requests & bug report": [[2, "feature-requests-bug-report"]], "Feedback": [[2, "feedback"]], "File reading": [[7, "file-reading"]], "Half-precision": [[17, "half-precision"]], "Installation": [[3, null]], "Integrate contributions into your pipeline": [[15, null]], "Let\u2019s connect": [[2, "let-s-connect"]], "Line": [[7, "line"]], "Loading from Huggingface Hub": [[14, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[12, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[12, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[4, "main-features"]], "Model optimization": [[17, "model-optimization"]], "Model zoo": [[4, "model-zoo"]], "Modifying the documentation": [[2, "modifying-the-documentation"]], "Naming conventions": [[14, "naming-conventions"]], "OCR": [[16, "ocr"]], "Object Detection": [[16, "object-detection"]], "Our Pledge": [[1, "our-pledge"]], "Our Standards": [[1, "our-standards"]], "Page": [[7, "page"]], "Preparing your model for inference": [[17, null]], "Prerequisites": [[3, "prerequisites"]], "Pretrained community models": [[14, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[14, "pushing-to-the-huggingface-hub"]], "Questions": [[2, "questions"]], "Recognition": [[14, "recognition"], [16, "recognition"]], "Recognition predictors": [[18, "recognition-predictors"]], "Returns:": [[6, "returns"], [7, "returns"], [7, "id11"], [7, "id13"], [7, "id15"], [7, "id19"], [7, "id23"], [7, "id27"], [7, "id31"], [8, "returns"], [8, "id6"], [8, "id11"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id29"], [8, "id34"], [8, "id39"], [8, "id44"], [8, "id49"], [8, "id53"], [8, "id57"], [8, "id62"], [8, "id66"], [8, "id71"], [8, "id76"], [8, "id80"], [8, "id84"], [8, "id88"], [8, "id93"], [8, "id98"], [8, "id102"], [8, "id107"], [8, "id112"], [8, "id117"], [8, "id122"], [8, "id126"], [8, "id130"], [8, "id135"], [8, "id140"], [8, "id145"], [8, "id149"], [8, "id153"], [8, "id158"], [8, "id162"], [8, "id166"], [8, "id168"], [8, "id170"], [8, "id172"], [10, "returns"]], "Scope": [[1, "scope"]], "Share your model with the community": [[14, null]], "Supported Vocabs": [[6, "supported-vocabs"]], "Supported contribution modules": [[5, "supported-contribution-modules"]], "Supported datasets": [[4, "supported-datasets"]], "Supported transformations": [[9, "supported-transformations"]], "Synthetic dataset generator": [[6, "synthetic-dataset-generator"], [16, "synthetic-dataset-generator"]], "Task evaluation": [[10, "task-evaluation"]], "Text Detection": [[18, "text-detection"]], "Text Recognition": [[18, "text-recognition"]], "Text detection models": [[4, "text-detection-models"]], "Text recognition models": [[4, "text-recognition-models"]], "Train your own model": [[12, null]], "Two-stage approaches": [[18, "two-stage-approaches"]], "Unit tests": [[2, "unit-tests"]], "Use your own datasets": [[16, "use-your-own-datasets"]], "Using your ONNX exported model": [[17, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[3, "via-conda-only-for-linux"]], "Via Git": [[3, "via-git"]], "Via Python Package": [[3, "via-python-package"]], "Visualization": [[10, "visualization"]], "What should I do with the output?": [[18, "what-should-i-do-with-the-output"]], "Word": [[7, "word"]], "docTR Notebooks": [[11, null]], "docTR Vocabs": [[6, "id62"]], "docTR: Document Text Recognition": [[4, null]], "doctr.contrib": [[5, null]], "doctr.datasets": [[6, null], [6, "datasets"]], "doctr.io": [[7, null]], "doctr.models": [[8, null]], "doctr.models.classification": [[8, "doctr-models-classification"]], "doctr.models.detection": [[8, "doctr-models-detection"]], "doctr.models.factory": [[8, "doctr-models-factory"]], "doctr.models.recognition": [[8, "doctr-models-recognition"]], "doctr.models.zoo": [[8, "doctr-models-zoo"]], "doctr.transforms": [[9, null]], "doctr.utils": [[10, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[7, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[7, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[9, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[6, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[9, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[9, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[6, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[6, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[8, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[6, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[6, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[7, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[7, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[6, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[6, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[9, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[9, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[6, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[6, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[6, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[6, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[6, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[8, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[9, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[7, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[6, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[9, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[8, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[6, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[9, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[7, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[9, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[9, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[9, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[9, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[9, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[9, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[9, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[9, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[9, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[9, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[9, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[9, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[7, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[7, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[7, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[6, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[9, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[7, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[7, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[6, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[6, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[6, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[6, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[9, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[10, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[6, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[7, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[6, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[6, 0, 1, "", "CORD"], [6, 0, 1, "", "CharacterGenerator"], [6, 0, 1, "", "DetectionDataset"], [6, 0, 1, "", "DocArtefacts"], [6, 0, 1, "", "FUNSD"], [6, 0, 1, "", "IC03"], [6, 0, 1, "", "IC13"], [6, 0, 1, "", "IIIT5K"], [6, 0, 1, "", "IIITHWS"], [6, 0, 1, "", "IMGUR5K"], [6, 0, 1, "", "MJSynth"], [6, 0, 1, "", "OCRDataset"], [6, 0, 1, "", "RecognitionDataset"], [6, 0, 1, "", "SROIE"], [6, 0, 1, "", "SVHN"], [6, 0, 1, "", "SVT"], [6, 0, 1, "", "SynthText"], [6, 0, 1, "", "WILDRECEIPT"], [6, 0, 1, "", "WordGenerator"], [6, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[6, 0, 1, "", "DataLoader"]], "doctr.io": [[7, 0, 1, "", "Artefact"], [7, 0, 1, "", "Block"], [7, 0, 1, "", "Document"], [7, 0, 1, "", "DocumentFile"], [7, 0, 1, "", "Line"], [7, 0, 1, "", "Page"], [7, 0, 1, "", "Word"], [7, 1, 1, "", "decode_img_as_tensor"], [7, 1, 1, "", "read_html"], [7, 1, 1, "", "read_img_as_numpy"], [7, 1, 1, "", "read_img_as_tensor"], [7, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[7, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[7, 2, 1, "", "from_images"], [7, 2, 1, "", "from_pdf"], [7, 2, 1, "", "from_url"]], "doctr.io.Page": [[7, 2, 1, "", "show"]], "doctr.models": [[8, 1, 1, "", "kie_predictor"], [8, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[8, 1, 1, "", "crop_orientation_predictor"], [8, 1, 1, "", "magc_resnet31"], [8, 1, 1, "", "mobilenet_v3_large"], [8, 1, 1, "", "mobilenet_v3_large_r"], [8, 1, 1, "", "mobilenet_v3_small"], [8, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [8, 1, 1, "", "mobilenet_v3_small_page_orientation"], [8, 1, 1, "", "mobilenet_v3_small_r"], [8, 1, 1, "", "page_orientation_predictor"], [8, 1, 1, "", "resnet18"], [8, 1, 1, "", "resnet31"], [8, 1, 1, "", "resnet34"], [8, 1, 1, "", "resnet50"], [8, 1, 1, "", "textnet_base"], [8, 1, 1, "", "textnet_small"], [8, 1, 1, "", "textnet_tiny"], [8, 1, 1, "", "vgg16_bn_r"], [8, 1, 1, "", "vit_b"], [8, 1, 1, "", "vit_s"]], "doctr.models.detection": [[8, 1, 1, "", "db_mobilenet_v3_large"], [8, 1, 1, "", "db_resnet50"], [8, 1, 1, "", "detection_predictor"], [8, 1, 1, "", "fast_base"], [8, 1, 1, "", "fast_small"], [8, 1, 1, "", "fast_tiny"], [8, 1, 1, "", "linknet_resnet18"], [8, 1, 1, "", "linknet_resnet34"], [8, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[8, 1, 1, "", "from_hub"], [8, 1, 1, "", "login_to_hub"], [8, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[8, 1, 1, "", "crnn_mobilenet_v3_large"], [8, 1, 1, "", "crnn_mobilenet_v3_small"], [8, 1, 1, "", "crnn_vgg16_bn"], [8, 1, 1, "", "master"], [8, 1, 1, "", "parseq"], [8, 1, 1, "", "recognition_predictor"], [8, 1, 1, "", "sar_resnet31"], [8, 1, 1, "", "vitstr_base"], [8, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[9, 0, 1, "", "ChannelShuffle"], [9, 0, 1, "", "ColorInversion"], [9, 0, 1, "", "Compose"], [9, 0, 1, "", "GaussianBlur"], [9, 0, 1, "", "GaussianNoise"], [9, 0, 1, "", "LambdaTransformation"], [9, 0, 1, "", "Normalize"], [9, 0, 1, "", "OneOf"], [9, 0, 1, "", "RandomApply"], [9, 0, 1, "", "RandomBrightness"], [9, 0, 1, "", "RandomContrast"], [9, 0, 1, "", "RandomCrop"], [9, 0, 1, "", "RandomGamma"], [9, 0, 1, "", "RandomHorizontalFlip"], [9, 0, 1, "", "RandomHue"], [9, 0, 1, "", "RandomJpegQuality"], [9, 0, 1, "", "RandomResize"], [9, 0, 1, "", "RandomRotate"], [9, 0, 1, "", "RandomSaturation"], [9, 0, 1, "", "RandomShadow"], [9, 0, 1, "", "Resize"], [9, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[10, 0, 1, "", "DetectionMetric"], [10, 0, 1, "", "LocalizationConfusion"], [10, 0, 1, "", "OCRMetric"], [10, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.visualization": [[10, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [1, 7, 8, 10, 14, 17], "0": [1, 3, 6, 9, 10, 12, 15, 16, 18], "00": 18, "01": 18, "0123456789": 6, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "02562": 8, "03": 18, "035": 18, "0361328125": 18, "04": 18, "05": 18, "06": 18, "06640625": 18, "07": 18, "08": [9, 18], "09": 18, "0966796875": 18, "1": [6, 7, 8, 9, 10, 12, 16, 18], "10": [6, 10, 18], "100": [6, 9, 10, 16, 18], "1000": 18, "101": 6, "1024": [8, 12, 18], "104": 6, "106": 6, "108": 6, "1095": 16, "11": 18, "110": 10, "1107": 16, "114": 6, "115": 6, "1156": 16, "116": 6, "118": 6, "11800h": 18, "11th": 18, "12": 18, "120": 6, "123": 6, "126": 6, "1268": 16, "128": [8, 12, 17, 18], "13": 18, "130": 6, "13068": 16, "131": 6, "1337891": 16, "1357421875": 18, "1396484375": 18, "14": 18, "1420": 18, "14470v1": 6, "149": 16, "15": 18, "150": [10, 18], "1552": 18, "16": [8, 17, 18], "1630859375": 18, "1684": 18, "16x16": 8, "17": 18, "1778": 18, "1782": 18, "18": [8, 18], "185546875": 18, "1900": 18, "1910": 8, "19342": 16, "19370": 16, "195": 6, "19598": 16, "199": 18, "1999": 18, "2": [3, 4, 6, 7, 9, 15, 18], "20": 18, "200": 10, "2000": 16, "2003": [4, 6], "2012": 6, "2013": [4, 6], "2015": 6, "2019": 4, "207901": 16, "21": 18, "2103": 6, "2186": 16, "21888": 16, "22": 18, "224": [8, 9], "225": 9, "22672": 16, "229": [9, 16], "23": 18, "233": 16, "236": 6, "24": 18, "246": 16, "249": 16, "25": 18, "2504": 18, "255": [7, 8, 9, 10, 18], "256": 8, "257": 16, "26": 18, "26032": 16, "264": 12, "27": 18, "2700": 16, "2710": 18, "2749": 12, "28": 18, "287": 12, "29": 18, "296": 12, "299": 12, "2d": 18, "3": [3, 4, 7, 8, 9, 10, 17, 18], "30": 18, "300": 16, "3000": 16, "301": 12, "30595": 18, "30ghz": 18, "31": 8, "32": [6, 8, 9, 12, 16, 17, 18], "3232421875": 18, "33": [9, 18], "33402": 16, "33608": 16, "34": [8, 18], "340": 18, "3456": 18, "3515625": 18, "36": 18, "360": 16, "37": [6, 18], "38": 18, "39": 18, "4": [8, 9, 10, 18], "40": 18, "406": 9, "41": 18, "42": 18, "43": 18, "44": 18, "45": 18, "456": 9, "46": 18, "47": 18, "472": 16, "48": [6, 18], "485": 9, "49": 18, "49377": 16, "5": [6, 9, 10, 15, 18], "50": [8, 16, 18], "51": 18, "51171875": 18, "512": 8, "52": [6, 18], "529": 18, "53": 18, "54": 18, "540": 18, "5478515625": 18, "55": 18, "56": 18, "57": 18, "58": [6, 18], "580": 18, "5810546875": 18, "583": 18, "59": 18, "597": 18, "5k": [4, 6], "5m": 18, "6": [9, 18], "60": 9, "600": [8, 10, 18], "61": 18, "62": 18, "626": 16, "63": 18, "64": [8, 9, 18], "641": 18, "647": 16, "65": 18, "66": 18, "67": 18, "68": 18, "69": 18, "693": 12, "694": 12, "695": 12, "6m": 18, "7": 18, "70": [6, 10, 18], "707470": 16, "71": [6, 18], "7100000": 16, "7141797": 16, "7149": 16, "72": 18, "72dpi": 7, "73": 18, "73257": 16, "74": 18, "75": [9, 18], "7581382": 16, "76": 18, "77": 18, "772": 12, "772875": 16, "78": 18, "785": 12, "79": 18, "793533": 16, "796": 16, "798": 12, "7m": 18, "8": [8, 9, 18], "80": 18, "800": [8, 10, 16, 18], "81": 18, "82": 18, "83": 18, "84": 18, "849": 16, "85": 18, "8564453125": 18, "857": 18, "85875": 16, "86": 18, "8603515625": 18, "87": 18, "8707": 16, "88": 18, "89": 18, "9": [3, 9, 18], "90": 18, "90k": 6, "90kdict32px": 6, "91": 18, "914085328578949": 18, "92": 18, "93": 18, "94": [6, 18], "95": [10, 18], "9578408598899841": 18, "96": 18, "97": 18, "98": 18, "99": 18, "9949972033500671": 18, "A": [1, 2, 4, 6, 7, 8, 11, 17], "As": 2, "Be": 18, "Being": 1, "By": 13, "For": [1, 2, 3, 12, 18], "If": [2, 7, 8, 12, 18], "In": [2, 6, 16], "It": [9, 14, 15, 17], "Its": [4, 8], "No": [1, 18], "Of": 6, "Or": [15, 17], "The": [1, 2, 6, 7, 10, 13, 15, 16, 17, 18], "Then": 8, "To": [2, 3, 13, 14, 15, 17, 18], "_": [1, 6, 8], "__call__": 18, "_build": 2, "_i": 10, "ab": 6, "abc": 17, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "abdef": [6, 16], "abl": [16, 18], "about": [1, 16, 18], "abov": 18, "abstractdataset": 6, "abus": 1, "accept": 1, "access": [4, 7, 16, 18], "account": [1, 14], "accur": 18, "accuraci": 10, "achiev": 17, "act": 1, "action": 1, "activ": 4, "ad": [2, 8, 9], "adapt": 1, "add": [9, 10, 14, 18], "add_hook": 18, "add_label": 10, "addit": [2, 3, 7, 15, 18], "addition": [2, 18], "address": [1, 7], "adjust": 9, "advanc": 1, "advantag": 17, "advis": 2, "aesthet": [4, 6], "affect": 1, "after": [14, 18], "ag": 1, "again": 8, "aggreg": [10, 16], "aggress": 1, "align": [1, 7, 9], "all": [1, 2, 5, 6, 7, 9, 10, 15, 16, 18], "allow": [1, 17], "along": 18, "alreadi": [2, 17], "also": [1, 8, 14, 15, 16, 18], "alwai": 16, "an": [1, 2, 4, 6, 7, 8, 10, 15, 17, 18], "analysi": [7, 15], "ancient_greek": 6, "angl": [7, 9], "ani": [1, 6, 7, 8, 9, 10, 17, 18], "annot": 6, "anot": 16, "anoth": [8, 12, 16], "answer": 1, "anyascii": 10, "anyon": 4, "anyth": 15, "api": [2, 4], "apolog": 1, "apologi": 1, "app": 2, "appear": 1, "appli": [1, 6, 9], "applic": [4, 8], "appoint": 1, "appreci": 14, "appropri": [1, 2, 18], "ar": [1, 2, 3, 5, 6, 7, 9, 10, 11, 15, 16, 18], "arab": 6, "arabic_diacrit": 6, "arabic_lett": 6, "arabic_punctu": 6, "arbitrarili": [4, 8], "arch": [8, 14], "architectur": [4, 8, 14, 15], "area": 18, "argument": [6, 7, 8, 10, 12, 18], "around": 1, "arrai": [7, 9, 10], "art": [4, 15], "artefact": [10, 15, 18], "artefact_typ": 7, "artifici": [4, 6], "arxiv": [6, 8], "asarrai": 10, "ascii_lett": 6, "aspect": [4, 8, 9, 18], "assess": 10, "assign": 10, "associ": 7, "assum": 8, "assume_straight_pag": [8, 12, 18], "astyp": [8, 10, 18], "attack": 1, "attend": [4, 8], "attent": [1, 8], "autom": 4, "automat": 18, "autoregress": [4, 8], "avail": [1, 4, 5, 9], "averag": [9, 18], "avoid": [1, 3], "aw": [4, 18], "awar": 18, "azur": 18, "b": [8, 10, 18], "b_j": 10, "back": 2, "backbon": 8, "backend": 18, "background": 16, "bangla": 6, "bar": 15, "bar_cod": 16, "base": [4, 8, 15], "baselin": [4, 8, 18], "batch": [6, 8, 9, 15, 16, 18], "batch_siz": [6, 12, 15, 16, 17], "bblanchon": 3, "bbox": 18, "becaus": 13, "been": [2, 10, 16, 18], "befor": [6, 8, 9, 18], "begin": 10, "behavior": [1, 18], "being": [10, 18], "belong": 18, "benchmark": 18, "best": 1, "better": [11, 18], "between": [9, 10, 18], "bgr": 7, "bilinear": 9, "bin_thresh": 18, "binar": [4, 8, 18], "binari": [7, 17, 18], "bit": 17, "block": [10, 18], "block_1_1": 18, "blur": 9, "bmvc": 6, "bn": 14, "bodi": [1, 18], "bool": [6, 7, 8, 9, 10], "boolean": [8, 18], "both": [4, 6, 9, 16, 18], "bottom": [8, 18], "bound": [6, 7, 8, 9, 10, 15, 16, 18], "box": [6, 7, 8, 9, 10, 15, 16, 18], "box_thresh": 18, "bright": 9, "browser": [2, 4], "build": [2, 3, 17], "built": 2, "byte": [7, 18], "c": [3, 7, 10], "c_j": 10, "cach": [2, 6, 13], "cache_sampl": 6, "call": 17, "callabl": [6, 9], "can": [2, 3, 12, 13, 14, 15, 16, 18], "capabl": [2, 11, 18], "case": [6, 10], "cf": 18, "cfg": 18, "challeng": 6, "challenge2_test_task12_imag": 6, "challenge2_test_task1_gt": 6, "challenge2_training_task12_imag": 6, "challenge2_training_task1_gt": 6, "chang": [13, 18], "channel": [1, 2, 7, 9], "channel_prior": 3, "channelshuffl": 9, "charact": [4, 6, 7, 10, 16, 18], "charactergener": [6, 16], "characterist": 1, "charg": 18, "charset": 18, "chart": 7, "check": [2, 14, 18], "checkpoint": 8, "chip": 3, "ci": 2, "clarifi": 1, "clariti": 1, "class": [1, 6, 7, 9, 10, 18], "class_nam": 12, "classif": [16, 18], "classmethod": 7, "clear": 2, "clone": 3, "close": 2, "co": 14, "code": [4, 7, 15], "codecov": 2, "colab": 11, "collate_fn": 6, "collect": [7, 15], "color": 9, "colorinvers": 9, "column": 7, "com": [1, 3, 7, 8, 14], "combin": 18, "command": [2, 15], "comment": 1, "commit": 1, "common": [1, 9, 10, 17], "commun": 1, "compar": 4, "comparison": [10, 18], "competit": 6, "compil": [11, 18], "complaint": 1, "complementari": 10, "complet": 2, "compon": 18, "compos": [6, 18], "comprehens": 18, "comput": [6, 10, 17, 18], "conf_threshold": 15, "confid": [7, 18], "config": [3, 8], "configur": 8, "confus": 10, "consecut": [9, 18], "consequ": 1, "consid": [1, 2, 6, 7, 10, 18], "consist": 18, "consolid": [4, 6], "constant": 9, "construct": 1, "contact": 1, "contain": [5, 6, 11, 16, 18], "content": [6, 7, 18], "context": 8, "contib": 3, "continu": 1, "contrast": 9, "contrast_factor": 9, "contrib": [3, 15], "contribut": 1, "contributor": 2, "convers": 7, "convert": [7, 9], "convolut": 8, "coordin": [7, 18], "cord": [4, 6, 16, 18], "core": [10, 18], "corner": 18, "correct": 9, "correspond": [3, 7, 9, 18], "could": [1, 15], "counterpart": 10, "cover": 2, "coverag": 2, "cpu": [4, 12, 17], "creat": 14, "crnn": [4, 8, 14], "crnn_mobilenet_v3_larg": [8, 14, 18], "crnn_mobilenet_v3_smal": [8, 17, 18], "crnn_vgg16_bn": [8, 12, 14, 18], "crop": [7, 8, 9, 12, 16, 18], "crop_orient": [7, 18], "crop_orientation_predictor": [8, 12], "crop_param": 12, "cuda": 17, "currenc": 6, "current": [2, 12, 18], "custom": [14, 15, 17, 18], "custom_crop_orientation_model": 12, "custom_page_orientation_model": 12, "customhook": 18, "cvit": 4, "czczup": 8, "czech": 6, "d": [6, 16], "danish": 6, "data": [4, 6, 7, 9, 10, 12, 14], "dataload": 16, "dataset": [8, 12, 18], "dataset_info": 6, "date": [12, 18], "db": 14, "db_mobilenet_v3_larg": [8, 14, 18], "db_resnet34": 18, "db_resnet50": [8, 12, 14, 18], "dbnet": [4, 8], "deal": [11, 18], "decis": 1, "decod": 7, "decode_img_as_tensor": 7, "dedic": 17, "deem": 1, "deep": [8, 18], "def": 18, "default": [3, 7, 12, 13, 18], "defer": 16, "defin": [10, 17], "degre": [7, 9, 18], "degress": 7, "delet": 2, "delimit": 18, "delta": 9, "demo": [2, 4], "demonstr": 1, "depend": [2, 3, 4, 18], "deploi": 2, "deploy": 4, "derogatori": 1, "describ": 8, "descript": 11, "design": 9, "desir": 7, "det_arch": [8, 12, 14, 17], "det_b": 18, "det_model": [12, 14, 17], "det_param": 12, "det_predictor": [12, 18], "detail": [12, 18], "detect": [6, 7, 10, 11, 12, 15], "detect_languag": 8, "detect_orient": [8, 12, 18], "detection_predictor": [8, 18], "detection_task": [6, 16], "detectiondataset": [6, 16], "detectionmetr": 10, "detectionpredictor": [8, 12], "detector": [4, 8, 15], "deterior": 8, "determin": 1, "dev": [2, 13], "develop": 3, "deviat": 9, "devic": 17, "dict": [7, 10, 18], "dictionari": [7, 10], "differ": 1, "differenti": [4, 8], "digit": [4, 6, 16], "dimens": [7, 10, 18], "dimension": 9, "direct": 6, "directli": [14, 18], "directori": [2, 13], "disabl": [1, 13, 18], "disable_crop_orient": 18, "disable_page_orient": 18, "disclaim": 18, "discuss": 2, "disparag": 1, "displai": [7, 10], "display_artefact": 10, "distribut": 9, "div": 18, "divers": 1, "divid": 7, "do": [2, 3, 8], "doc": [2, 7, 15, 17, 18], "docartefact": [6, 16], "docstr": 2, "doctr": [3, 12, 13, 14, 15, 16, 17, 18], "doctr_cache_dir": 13, "doctr_multiprocessing_dis": 13, "document": [6, 8, 10, 11, 12, 15, 16, 17, 18], "documentbuild": 18, "documentfil": [7, 12, 14, 15, 17], "doesn": 17, "don": [12, 18], "done": 9, "download": [6, 16], "downsiz": 8, "draw": 9, "drop": 6, "drop_last": 6, "dtype": [7, 8, 9, 10, 17], "dual": [4, 6], "dummi": 14, "dummy_img": 18, "dummy_input": 17, "dure": 1, "dutch": 6, "dynam": [6, 15], "dynamic_seq_length": 6, "e": [1, 2, 3, 7, 8], "each": [4, 6, 7, 8, 9, 10, 16, 18], "eas": 2, "easi": [4, 10, 14, 17], "easili": [7, 10, 12, 14, 16, 18], "econom": 1, "edit": 1, "educ": 1, "effect": 18, "effici": [2, 4, 6, 8], "either": [10, 18], "element": [6, 7, 8, 18], "els": [2, 15], "email": 1, "empathi": 1, "en": 18, "enabl": [6, 7], "enclos": 7, "encod": [4, 6, 7, 8, 18], "encode_sequ": 6, "encount": 2, "encrypt": 7, "end": [4, 6, 8, 10], "english": [6, 16], "enough": [2, 18], "ensur": 2, "entri": 6, "environ": [1, 13], "eo": 6, "equiv": 18, "estim": 8, "etc": [7, 15], "ethnic": 1, "evalu": [16, 18], "event": 1, "everyon": 1, "everyth": [2, 18], "exact": [10, 18], "exampl": [1, 2, 4, 6, 8, 14, 18], "exchang": 17, "execut": 18, "exist": 14, "expand": 9, "expect": [7, 9, 10], "experi": 1, "explan": [1, 18], "explicit": 1, "exploit": [4, 8], "export": [7, 8, 10, 11, 15, 18], "export_as_straight_box": [8, 18], "export_as_xml": 18, "export_model_to_onnx": 17, "express": [1, 9], "extens": 7, "extern": [1, 16], "extract": [4, 6], "extractor": 8, "f_": 10, "f_a": 10, "factor": 9, "fair": 1, "fairli": 1, "fals": [6, 7, 8, 9, 10, 12, 18], "faq": 1, "fascan": 14, "fast": [4, 6, 8], "fast_bas": [8, 18], "fast_smal": [8, 18], "fast_tini": [8, 18], "faster": [4, 8, 17], "fasterrcnn_mobilenet_v3_large_fpn": 8, "favorit": 18, "featur": [3, 8, 10, 11, 12, 15], "feedback": 1, "feel": [2, 14], "felix92": 14, "few": [17, 18], "figsiz": 10, "figur": [10, 15], "file": [2, 6], "final": 8, "find": [2, 16], "finnish": 6, "first": [2, 6], "firsthand": 6, "fit": [8, 18], "flag": 18, "flip": 9, "float": [7, 9, 10, 17], "float32": [7, 8, 9, 17], "fn": 9, "focu": 14, "focus": [1, 6], "folder": 6, "follow": [1, 2, 3, 6, 9, 10, 12, 13, 14, 15, 18], "font": 6, "font_famili": 6, "foral": 10, "forc": 2, "forg": 3, "form": [4, 6, 18], "format": [7, 10, 12, 16, 17, 18], "forpost": [4, 6], "forum": 2, "fp16": 17, "frac": 10, "framework": [3, 14, 16, 18], "free": [1, 2, 14], "french": [6, 12, 14, 18], "friendli": 4, "from": [1, 4, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18], "from_hub": [8, 14], "from_imag": [7, 14, 15, 17], "from_pdf": 7, "from_url": 7, "full": [6, 10, 18], "function": [6, 9, 10, 15], "funsd": [4, 6, 16, 18], "further": 16, "futur": 6, "g": [7, 8], "g_": 10, "g_x": 10, "gamma": 9, "gaussian": 9, "gaussianblur": 9, "gaussiannois": 9, "gen": 18, "gender": 1, "gener": [2, 4, 7, 8], "generic_cyrillic_lett": 6, "geometri": [4, 7, 18], "geq": 10, "german": [6, 12, 14], "get": [17, 18], "git": 14, "github": [2, 3, 8, 14], "give": [1, 15], "given": [6, 7, 9, 10, 18], "global": 8, "go": 18, "good": 17, "googl": 2, "googlevis": 4, "gpu": [4, 15, 17], "gracefulli": 1, "graph": [4, 6, 7], "grayscal": 9, "ground": 10, "groung": 10, "group": [4, 18], "gt": 10, "gt_box": 10, "gt_label": 10, "guid": 2, "guidanc": 16, "gvision": 18, "h": [7, 8, 9], "h_": 10, "ha": [2, 6, 10, 16], "handl": [11, 16, 18], "handwrit": 6, "handwritten": 16, "harass": 1, "hardwar": 18, "harm": 1, "hat": 10, "have": [1, 2, 10, 12, 14, 16, 17, 18], "head": [8, 18], "healthi": 1, "hebrew": 6, "height": [7, 9], "hello": [10, 18], "help": 17, "here": [5, 9, 11, 15, 16, 18], "hf": 8, "hf_hub_download": 8, "high": 7, "higher": [3, 6, 18], "hindi": 6, "hindi_digit": 6, "hocr": 18, "hook": 18, "horizont": [7, 9, 18], "hous": 6, "how": [2, 11, 12, 14, 16], "howev": 16, "hsv": 9, "html": [1, 2, 3, 7, 18], "http": [1, 3, 6, 7, 8, 14, 18], "hub": 8, "hue": 9, "huggingfac": 8, "hw": 6, "i": [1, 2, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17], "i7": 18, "ic03": [4, 6, 16], "ic13": [4, 6, 16], "icdar": [4, 6], "icdar2019": 6, "id": 18, "ident": 1, "identifi": 4, "iiit": [4, 6], "iiit5k": [6, 16], "iiithw": [4, 6, 16], "imag": [4, 6, 7, 8, 9, 10, 14, 15, 16, 18], "imagenet": 8, "imageri": 1, "images_90k_norm": 6, "img": [6, 9, 16, 17], "img_cont": 7, "img_fold": [6, 16], "img_path": 7, "img_transform": 6, "imgur5k": [4, 6, 16], "imgur5k_annot": 6, "imlist": 6, "impact": 1, "implement": [6, 7, 8, 9, 10, 18], "import": [6, 7, 8, 9, 10, 12, 14, 15, 16, 17, 18], "improv": 8, "inappropri": 1, "incid": 1, "includ": [1, 6, 16, 17], "inclus": 1, "increas": 9, "independ": 9, "index": [2, 7], "indic": 10, "individu": 1, "infer": [4, 8, 9, 15, 18], "inform": [1, 2, 4, 6, 16], "input": [2, 7, 8, 9, 17, 18], "input_crop": 8, "input_pag": [8, 10, 18], "input_shap": 17, "input_tensor": 8, "inspir": [1, 9], "instal": [14, 15, 17], "instanc": [1, 18], "instanti": [8, 18], "instead": [6, 7, 8], "insult": 1, "int": [6, 7, 9], "int64": 10, "integ": 10, "integr": [4, 14, 16], "intel": 18, "interact": [1, 7, 10], "interfac": [14, 17], "interoper": 17, "interpol": 9, "interpret": [6, 7], "intersect": 10, "invert": 9, "investig": 1, "invis": 1, "involv": [1, 18], "io": [12, 14, 15, 17], "iou": 10, "iou_thresh": 10, "iou_threshold": 15, "irregular": [4, 8, 16], "isn": 6, "issu": [1, 2, 14], "italian": 6, "iter": [6, 9, 16, 18], "its": [7, 8, 9, 10, 16, 18], "itself": [8, 14], "j": 10, "job": 2, "join": 2, "jpeg": 9, "jpegqual": 9, "jpg": [6, 7, 14, 17], "json": [6, 16, 18], "json_output": 18, "jump": 2, "just": 1, "kei": [4, 6], "kera": [8, 17], "kernel": [4, 8, 9], "kernel_shap": 9, "keywoard": 8, "keyword": [6, 7, 8, 10], "kie": [8, 12], "kie_predictor": [8, 12], "kiepredictor": 8, "kind": 1, "know": [2, 17], "kwarg": [6, 7, 8, 10], "l": 10, "l_j": 10, "label": [6, 10, 15, 16], "label_fil": [6, 16], "label_fold": 6, "label_path": [6, 16], "labels_path": [6, 16], "ladder": 1, "lambda": 9, "lambdatransform": 9, "lang": 18, "languag": [1, 4, 6, 7, 8, 14, 18], "larg": [8, 14], "largest": 10, "last": [3, 6], "latenc": 8, "later": 2, "latest": 18, "latin": 6, "layer": 17, "layout": 18, "lead": 1, "leader": 1, "learn": [1, 4, 8, 17, 18], "least": 3, "left": [10, 18], "legacy_french": 6, "length": [6, 18], "less": [17, 18], "level": [1, 6, 10, 18], "leverag": 11, "lf": 14, "librari": [2, 3, 11, 12], "light": 4, "lightweight": 17, "like": 1, "limits_": 10, "line": [4, 8, 10, 18], "line_1_1": 18, "link": 12, "linknet": [4, 8], "linknet_resnet18": [8, 12, 17, 18], "linknet_resnet34": [8, 17, 18], "linknet_resnet50": [8, 18], "list": [6, 7, 9, 10, 14], "ll": 10, "load": [4, 6, 8, 15, 17], "load_state_dict": 12, "load_weight": 12, "loc_pr": 18, "local": [2, 4, 6, 8, 10, 16, 18], "localis": 6, "localizationconfus": 10, "locat": [2, 7, 18], "login": 8, "login_to_hub": [8, 14], "logo": [7, 15, 16], "love": 14, "lower": [9, 10, 18], "m": [2, 10, 18], "m1": 3, "macbook": 3, "machin": 17, "made": 4, "magc_resnet31": 8, "mai": [1, 2], "mail": 1, "main": 11, "maintain": 4, "mainten": 2, "make": [1, 2, 10, 12, 13, 14, 17, 18], "mani": [16, 18], "manipul": 18, "map": [6, 8], "map_loc": 12, "master": [4, 8, 18], "match": [10, 18], "mathcal": 10, "matplotlib": [7, 10], "max": [6, 9, 10], "max_angl": 9, "max_area": 9, "max_char": [6, 16], "max_delta": 9, "max_gain": 9, "max_gamma": 9, "max_qual": 9, "max_ratio": 9, "maximum": [6, 9], "maxval": [8, 9], "mbox": 10, "mean": [9, 10, 12], "meaniou": 10, "meant": [7, 17], "measur": 18, "media": 1, "median": 8, "meet": 12, "member": 1, "memori": [13, 17], "mention": 18, "merg": 6, "messag": 2, "meta": 18, "metadata": 17, "metal": 3, "method": [7, 9, 18], "metric": [10, 18], "middl": 18, "might": [17, 18], "min": 9, "min_area": 9, "min_char": [6, 16], "min_gain": 9, "min_gamma": 9, "min_qual": 9, "min_ratio": 9, "min_val": 9, "minde": [1, 3, 4, 8], "minim": [2, 4], "minimalist": [4, 8], "minimum": [3, 6, 9, 10, 18], "minval": 9, "miss": 3, "mistak": 1, "mixed_float16": 17, "mixed_precis": 17, "mjsynth": [4, 6, 16], "mnt": 6, "mobilenet": [8, 14], "mobilenet_v3_larg": 8, "mobilenet_v3_large_r": 8, "mobilenet_v3_smal": [8, 12], "mobilenet_v3_small_crop_orient": [8, 12], "mobilenet_v3_small_page_orient": [8, 12], "mobilenet_v3_small_r": 8, "mobilenetv3": 8, "modal": [4, 6], "mode": 3, "model": [6, 10, 13, 15, 16], "model_nam": [8, 14, 17], "model_path": [15, 17], "moder": 1, "modif": 2, "modifi": [8, 13, 18], "modul": [3, 7, 8, 9, 10, 18], "more": [2, 16, 18], "most": 18, "mozilla": 1, "multi": [4, 8], "multilingu": [6, 14], "multipl": [6, 7, 9, 18], "multipli": 9, "multiprocess": 13, "my": 8, "my_awesome_model": 14, "my_hook": 18, "n": [6, 10], "name": [6, 8, 17, 18], "nation": 1, "natur": [1, 4, 6], "ndarrai": [6, 7, 9, 10], "necessari": [3, 12, 13], "need": [2, 3, 6, 10, 12, 13, 14, 15, 18], "neg": 9, "nest": 18, "network": [4, 6, 8, 17], "neural": [4, 6, 8, 17], "new": [2, 10], "next": [6, 16], "nois": 9, "noisi": [4, 6], "non": [4, 6, 7, 8, 9, 10], "none": [6, 7, 8, 9, 10, 18], "normal": [8, 9], "norwegian": 6, "note": [0, 2, 6, 8, 12, 14, 15, 17], "now": 2, "np": [8, 9, 10, 18], "num_output_channel": 9, "num_sampl": [6, 16], "number": [6, 9, 10, 18], "numpi": [7, 8, 10, 18], "o": 3, "obb": 15, "obj_detect": 14, "object": [6, 7, 10, 15, 18], "objectness_scor": [7, 18], "oblig": 1, "obtain": 18, "occupi": 17, "ocr": [4, 6, 8, 10, 14], "ocr_carea": 18, "ocr_db_crnn": 10, "ocr_lin": 18, "ocr_pag": 18, "ocr_par": 18, "ocr_predictor": [8, 12, 14, 17, 18], "ocrdataset": [6, 16], "ocrmetr": 10, "ocrpredictor": [8, 12], "ocrx_word": 18, "offens": 1, "offici": [1, 8], "offlin": 1, "offset": 9, "onc": 18, "one": [2, 6, 8, 9, 12, 14, 18], "oneof": 9, "ones": [6, 10], "onli": [2, 8, 9, 10, 12, 14, 16, 17, 18], "onlin": 1, "onnx": 15, "onnxruntim": [15, 17], "onnxtr": 17, "opac": 9, "opacity_rang": 9, "open": [1, 2, 14, 17], "opinion": 1, "optic": [4, 18], "optim": [4, 18], "option": [6, 8, 12], "order": [2, 6, 7, 9], "org": [1, 6, 8, 18], "organ": 7, "orient": [1, 7, 8, 11, 15, 18], "orientationpredictor": 8, "other": [1, 2], "otherwis": [1, 7, 10], "our": [2, 8, 18], "out": [2, 8, 9, 10, 18], "outpout": 18, "output": [7, 9, 17], "output_s": [7, 9], "outsid": 13, "over": [6, 10, 18], "overal": [1, 8], "overlai": 7, "overview": 15, "overwrit": 12, "overwritten": 14, "own": 4, "p": [9, 18], "packag": [2, 4, 10, 13, 15, 16, 17], "pad": [6, 8, 9, 18], "page": [3, 6, 8, 10, 12, 18], "page1": 7, "page2": 7, "page_1": 18, "page_idx": [7, 18], "page_orientation_predictor": [8, 12], "page_param": 12, "pair": 10, "paper": 8, "par_1_1": 18, "paragraph": 18, "paragraph_break": 18, "param": [9, 18], "paramet": [4, 7, 8, 17], "pars": [4, 6], "parseq": [4, 8, 14, 17, 18], "part": [6, 9, 18], "parti": 3, "partial": 18, "particip": 1, "pass": [6, 7, 8, 12, 18], "password": 7, "patch": [8, 10], "path": [6, 7, 15, 16, 17], "path_to_checkpoint": 12, "path_to_custom_model": 17, "path_to_pt": 12, "pattern": 1, "pdf": [7, 8, 11], "pdfpage": 7, "peopl": 1, "per": [9, 18], "perform": [4, 7, 8, 9, 10, 13, 17, 18], "period": 1, "permiss": 1, "permut": [4, 8], "persian_lett": 6, "person": [1, 16], "phase": 18, "photo": 16, "physic": [1, 7], "pick": 9, "pictur": 7, "pip": [2, 3, 15, 17], "pipelin": 18, "pixel": [7, 9, 18], "pleas": 2, "plot": 10, "plt": 10, "plug": 14, "plugin": 3, "png": 7, "point": 17, "polici": 13, "polish": 6, "polit": 1, "polygon": [6, 10, 18], "pool": 8, "portugues": 6, "posit": [1, 10], "possibl": [2, 10, 14, 18], "post": [1, 18], "postprocessor": 18, "potenti": 8, "power": 4, "ppageno": 18, "pre": [2, 8, 17], "precis": [10, 18], "pred": 10, "pred_box": 10, "pred_label": 10, "predefin": 16, "predict": [7, 8, 10, 18], "predictor": [4, 7, 8, 11, 12, 14, 17], "prefer": 16, "preinstal": 3, "preprocessor": [12, 18], "prerequisit": 14, "present": 11, "preserv": [8, 9, 18], "preserve_aspect_ratio": [7, 8, 9, 12, 18], "pretrain": [4, 8, 10, 12, 17, 18], "pretrained_backbon": [8, 12], "print": 18, "prior": 6, "privaci": 1, "privat": 1, "probabl": 9, "problem": 2, "procedur": 9, "process": [2, 4, 7, 12, 18], "processor": 18, "produc": [11, 18], "product": 17, "profession": 1, "project": [2, 16], "promptli": 1, "proper": 2, "properli": 6, "provid": [1, 2, 4, 14, 15, 16, 18], "public": [1, 4], "publicli": 18, "publish": 1, "pull": 14, "punctuat": 6, "pure": 6, "purpos": 2, "push_to_hf_hub": [8, 14], "py": 14, "pypdfium2": [3, 7], "pyplot": [7, 10], "python": [2, 15], "python3": 14, "pytorch": [3, 4, 8, 9, 12, 14, 17, 18], "q": 2, "qr": [7, 15], "qr_code": 16, "qualiti": 9, "question": 1, "quickli": 4, "quicktour": 11, "r": 18, "race": 1, "ramdisk": 6, "rand": [8, 9, 10, 17, 18], "random": [8, 9, 10, 18], "randomappli": 9, "randombright": 9, "randomcontrast": 9, "randomcrop": 9, "randomgamma": 9, "randomhorizontalflip": 9, "randomhu": 9, "randomjpegqu": 9, "randomli": 9, "randomres": 9, "randomrot": 9, "randomsatur": 9, "randomshadow": 9, "rang": 9, "rassi": 14, "ratio": [8, 9, 18], "raw": [7, 10], "re": 17, "read": [4, 6, 8], "read_html": 7, "read_img_as_numpi": 7, "read_img_as_tensor": 7, "read_pdf": 7, "readi": 17, "real": [4, 8, 9], "reason": [1, 4, 6], "rebuild": 2, "rebuilt": 2, "recal": [10, 18], "receipt": [4, 6, 18], "reco_arch": [8, 12, 14, 17], "reco_b": 18, "reco_model": [12, 14, 17], "reco_param": 12, "reco_predictor": 12, "recogn": 18, "recognit": [6, 10, 11, 12], "recognition_predictor": [8, 18], "recognition_task": [6, 16], "recognitiondataset": [6, 16], "recognitionpredictor": [8, 12], "rectangular": 8, "reduc": [3, 9], "refer": [2, 3, 12, 14, 15, 16, 18], "regardless": 1, "region": 18, "regroup": 10, "regular": 16, "reject": 1, "rel": [7, 9, 10, 18], "relat": 7, "releas": [0, 3], "relev": 15, "religion": 1, "remov": 1, "render": [7, 18], "repo": 8, "repo_id": [8, 14], "report": 1, "repositori": [6, 8, 14], "repres": [1, 17, 18], "represent": [4, 8], "request": [1, 14], "requir": [3, 9, 17], "research": 4, "residu": 8, "resiz": [9, 18], "resnet": 8, "resnet18": [8, 14], "resnet31": 8, "resnet34": 8, "resnet50": [8, 14], "resolv": 7, "resolve_block": 18, "resolve_lin": 18, "resourc": 16, "respect": 1, "rest": [2, 9, 10], "restrict": 13, "result": [2, 6, 7, 11, 14, 17, 18], "return": 18, "reusabl": 18, "review": 1, "rgb": [7, 9], "rgb_mode": 7, "rgb_output": 7, "right": [1, 8, 10], "robust": [4, 6], "root": 6, "rotat": [6, 7, 8, 9, 10, 11, 12, 16, 18], "run": [2, 3, 8], "same": [2, 7, 10, 16, 17, 18], "sampl": [6, 16, 18], "sample_transform": 6, "sar": [4, 8], "sar_resnet31": [8, 18], "satur": 9, "save": [8, 16], "scale": [7, 8, 9, 10], "scale_rang": 9, "scan": [4, 6], "scene": [4, 6, 8], "score": [7, 10], "script": [2, 16], "seamless": 4, "seamlessli": [4, 18], "search": 8, "searchabl": 11, "sec": 18, "second": 18, "section": [12, 14, 15, 17, 18], "secur": [1, 13], "see": [1, 2], "seen": 18, "segment": [4, 8, 18], "self": 18, "semant": [4, 8], "send": 18, "sens": 10, "sensit": 16, "separ": 18, "sequenc": [4, 6, 7, 8, 10, 18], "sequenti": [9, 18], "seri": 1, "seriou": 1, "set": [1, 3, 6, 8, 10, 13, 15, 18], "set_global_polici": 17, "sever": [7, 9, 18], "sex": 1, "sexual": 1, "shade": 9, "shape": [4, 7, 8, 9, 10, 18], "share": [13, 16], "shift": 9, "shm": 13, "should": [2, 6, 7, 9, 10], "show": [4, 7, 8, 10, 12, 14, 15], "showcas": [2, 11], "shuffl": [6, 9], "side": 10, "signatur": 7, "signific": 16, "simpl": [4, 8, 17], "simpler": 8, "sinc": [6, 16], "singl": [1, 2, 4, 6], "single_img_doc": 17, "size": [1, 6, 7, 9, 15, 18], "skew": 18, "slack": 2, "slightli": 8, "small": [2, 8, 18], "smallest": 7, "snapshot_download": 8, "snippet": 18, "so": [2, 3, 6, 8, 14, 16], "social": 1, "socio": 1, "some": [3, 11, 14, 16], "someth": 2, "somewher": 2, "sort": 1, "sourc": [6, 7, 8, 9, 10, 14], "space": [1, 18], "span": 18, "spanish": 6, "spatial": [4, 6, 7], "specif": [2, 3, 10, 12, 16, 18], "specifi": [1, 6, 7], "speed": [4, 8, 18], "sphinx": 2, "sroie": [4, 6, 16], "stabl": 3, "stackoverflow": 2, "stage": 4, "standalon": 11, "standard": 9, "start": 6, "state": [4, 10, 15], "static": 10, "statu": 1, "std": [9, 12], "step": 13, "still": 18, "str": [6, 7, 8, 9, 10], "straight": [6, 8, 16, 18], "straighten": 18, "straighten_pag": [8, 12, 18], "straigten_pag": 12, "stream": 7, "street": [4, 6], "strict": 3, "strictli": 10, "string": [6, 7, 10, 18], "strive": 3, "strong": [4, 8], "structur": [17, 18], "subset": [6, 18], "suggest": [2, 14], "sum": 10, "summari": 10, "support": [3, 12, 15, 17, 18], "sustain": 1, "svhn": [4, 6, 16], "svt": [6, 16], "swedish": 6, "symmetr": [8, 9, 18], "symmetric_pad": [8, 9, 18], "synthet": 4, "synthtext": [4, 6, 16], "system": 18, "t": [2, 6, 12, 17, 18], "tabl": [14, 15, 16], "take": [1, 6, 18], "target": [6, 7, 9, 10, 16], "target_s": 6, "task": [4, 6, 8, 14, 16, 18], "task2": 6, "team": 3, "techminde": 3, "templat": [2, 4], "tensor": [6, 7, 9, 18], "tensorflow": [3, 4, 7, 8, 9, 12, 14, 17, 18], "tensorspec": 17, "term": 1, "test": [6, 16], "test_set": 6, "text": [6, 7, 8, 10, 16], "text_output": 18, "textmatch": 10, "textnet": 8, "textnet_bas": 8, "textnet_smal": 8, "textnet_tini": 8, "textract": [4, 18], "textstylebrush": [4, 6], "textual": [4, 6, 7, 8, 18], "tf": [3, 7, 8, 9, 14, 17], "than": [2, 10, 14], "thank": 2, "thei": [1, 10], "them": [6, 18], "thi": [1, 2, 3, 5, 6, 9, 10, 12, 13, 14, 16, 17, 18], "thing": [17, 18], "third": 3, "those": [1, 7, 18], "threaten": 1, "threshold": 18, "through": [1, 9, 15, 16], "tilman": 14, "time": [1, 4, 8, 10, 16], "tini": 8, "titl": [7, 18], "tm": 18, "tmp": 13, "togeth": [2, 7], "tograi": 9, "tool": 16, "top": [10, 17, 18], "topic": 2, "torch": [3, 9, 12, 14, 17], "torchvis": 9, "total": 12, "toward": [1, 3], "train": [2, 6, 8, 9, 14, 15, 16, 17, 18], "train_it": [6, 16], "train_load": [6, 16], "train_pytorch": 14, "train_set": [6, 16], "train_tensorflow": 14, "trainabl": [4, 8], "tranform": 9, "transcrib": 18, "transfer": [4, 6], "transfo": 9, "transform": [4, 6, 8], "translat": 1, "troll": 1, "true": [6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18], "truth": 10, "tune": 17, "tupl": [6, 7, 9, 10], "two": [7, 13], "txt": 6, "type": [7, 10, 14, 17, 18], "typic": 18, "u": [1, 2], "ucsd": 6, "udac": 2, "uint8": [7, 8, 10, 18], "ukrainian": 6, "unaccept": 1, "underli": [16, 18], "underneath": 7, "understand": [4, 6, 18], "uniform": [8, 9], "uniformli": 9, "uninterrupt": [7, 18], "union": 10, "unittest": 2, "unlock": 7, "unoffici": 8, "unprofession": 1, "unsolicit": 1, "unsupervis": 4, "unwelcom": 1, "up": [8, 18], "updat": 10, "upgrad": 2, "upper": [6, 9], "uppercas": 16, "url": 7, "us": [1, 2, 3, 6, 8, 10, 11, 12, 13, 14, 15, 18], "usabl": 18, "usag": [13, 17], "use_polygon": [6, 10, 16], "useabl": 18, "user": [4, 7, 11], "utf": 18, "util": 17, "v1": 14, "v3": [8, 14, 18], "valid": 16, "valu": [2, 7, 9, 18], "valuabl": 4, "variabl": 13, "varieti": 6, "veri": 8, "version": [1, 2, 3, 17, 18], "vgg": 8, "vgg16": 14, "vgg16_bn_r": 8, "via": 1, "vietnames": 6, "view": [4, 6], "viewpoint": 1, "violat": 1, "visibl": 1, "vision": [4, 6, 8], "visiondataset": 6, "visiontransform": 8, "visual": [3, 4, 15], "visualize_pag": 10, "vit_": 8, "vit_b": 8, "vitstr": [4, 8, 17], "vitstr_bas": [8, 18], "vitstr_smal": [8, 12, 17, 18], "viz": 3, "vocab": [12, 14, 16, 17, 18], "vocabulari": [6, 12, 14], "w": [7, 8, 9, 10], "w3": 18, "wa": 1, "wai": [1, 4, 16], "want": [2, 17, 18], "warmup": 18, "wasn": 2, "we": [1, 2, 3, 4, 7, 9, 12, 14, 16, 17, 18], "weasyprint": 7, "web": [2, 7], "websit": 6, "welcom": 1, "well": [1, 17], "were": [1, 7, 18], "what": 1, "when": [1, 2, 8], "whenev": 2, "where": [2, 7, 9, 10], "whether": [2, 6, 7, 9, 10, 16, 18], "which": [1, 8, 13, 15, 16, 18], "whichev": 3, "while": [9, 18], "why": 1, "width": [7, 9], "wiki": 1, "wildreceipt": [4, 6, 16], "window": [8, 10], "wish": 2, "within": 1, "without": [1, 6, 8], "wonder": 2, "word": [4, 6, 8, 10, 18], "word_1_1": 18, "word_1_2": 18, "word_1_3": 18, "wordgener": [6, 16], "words_onli": 10, "work": [12, 13, 18], "workflow": 2, "worklow": 2, "world": [10, 18], "worth": 8, "wrap": 18, "wrapper": [6, 9], "write": 13, "written": [1, 7], "www": [1, 7, 18], "x": [7, 9, 10], "x_ascend": 18, "x_descend": 18, "x_i": 10, "x_size": 18, "x_wconf": 18, "xhtml": 18, "xmax": 7, "xmin": 7, "xml": 18, "xml_bytes_str": 18, "xml_element": 18, "xml_output": 18, "xmln": 18, "y": 10, "y_i": 10, "y_j": 10, "yet": 15, "ymax": 7, "ymin": 7, "yolov8": 15, "you": [2, 3, 6, 7, 8, 12, 13, 14, 15, 16, 17, 18], "your": [2, 4, 7, 10, 18], "yoursit": 7, "zero": [9, 10], "zoo": 12, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 6, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 6, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 6, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 6, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 6, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 6, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 6, "\u00e4\u00f6\u00e4\u00f6": 6, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 6, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 6, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 6, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 6, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 6, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 6, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0905": 6, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 6, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 6, "\u0950": 6, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 6, "\u09bd": 6, "\u09ce": 6, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 6}, "titles": ["Changelog", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 2, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 1], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 1], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 1], "31": 0, "4": [0, 1], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 18, "approach": 18, "architectur": 18, "arg": [6, 7, 8, 9, 10], "artefact": 7, "artefactdetect": 15, "attribut": 1, "avail": [15, 16, 18], "aw": 13, "ban": 1, "block": 7, "bug": 2, "changelog": 0, "choos": [16, 18], "classif": [8, 12, 14], "code": [1, 2], "codebas": 2, "commit": 2, "commun": 14, "compos": 9, "conda": 3, "conduct": 1, "connect": 2, "continu": 2, "contrib": 5, "contribut": [2, 5, 15], "contributor": 1, "convent": 14, "correct": 1, "coven": 1, "custom": [6, 12], "data": 16, "dataload": 6, "dataset": [4, 6, 16], "detect": [4, 8, 14, 16, 18], "develop": 2, "do": 18, "doctr": [2, 4, 5, 6, 7, 8, 9, 10, 11], "document": [2, 4, 7], "end": 18, "enforc": 1, "evalu": 10, "export": 17, "factori": 8, "featur": [2, 4], "feedback": 2, "file": 7, "from": 14, "gener": [6, 16], "git": 3, "guidelin": 1, "half": 17, "hub": 14, "huggingfac": 14, "i": 18, "infer": 17, "instal": [2, 3], "integr": [2, 15], "io": 7, "lambda": 13, "let": 2, "line": 7, "linux": 3, "load": [12, 14, 16], "loader": 6, "main": 4, "mode": 2, "model": [4, 8, 12, 14, 17, 18], "modifi": 2, "modul": [5, 15], "name": 14, "notebook": 11, "object": 16, "ocr": [16, 18], "onli": 3, "onnx": 17, "optim": 17, "option": 18, "orient": 12, "our": 1, "output": 18, "own": [12, 16], "packag": 3, "page": 7, "perman": 1, "pipelin": 15, "pledg": 1, "precis": 17, "predictor": 18, "prepar": 17, "prerequisit": 3, "pretrain": 14, "push": 14, "python": 3, "qualiti": 2, "question": 2, "read": 7, "readi": 16, "recognit": [4, 8, 14, 16, 18], "report": 2, "request": 2, "respons": 1, "return": [6, 7, 8, 10], "right": 18, "scope": 1, "share": 14, "should": 18, "stage": 18, "standard": 1, "structur": [2, 7], "style": 2, "support": [4, 5, 6, 9], "synthet": [6, 16], "task": 10, "temporari": 1, "test": 2, "text": [4, 18], "train": 12, "transform": 9, "two": 18, "unit": 2, "us": [16, 17], "util": 10, "v0": 0, "verif": 2, "via": 3, "visual": 10, "vocab": 6, "warn": 1, "what": 18, "word": 7, "your": [12, 14, 15, 16, 17], "zoo": [4, 8]}})
\ No newline at end of file
diff --git a/using_doctr/custom_models_training.html b/using_doctr/custom_models_training.html
index 580b4368b7..e664c6a950 100644
--- a/using_doctr/custom_models_training.html
+++ b/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -615,7 +615,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/using_doctr/running_on_aws.html b/using_doctr/running_on_aws.html
index ddb0c3c80f..81c38b49f5 100644
--- a/using_doctr/running_on_aws.html
+++ b/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -358,7 +358,7 @@ AWS Lambda
-
+
diff --git a/using_doctr/sharing_models.html b/using_doctr/sharing_models.html
index 07a3b2f2a3..4f5d1d68a5 100644
--- a/using_doctr/sharing_models.html
+++ b/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -540,7 +540,7 @@ Recognition
-
+
diff --git a/using_doctr/using_contrib_modules.html b/using_doctr/using_contrib_modules.html
index b4a10925e6..cf282ff3a4 100644
--- a/using_doctr/using_contrib_modules.html
+++ b/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -411,7 +411,7 @@ ArtefactDetection
-
+
diff --git a/using_doctr/using_datasets.html b/using_doctr/using_datasets.html
index 4a52df36ba..e30b6d6459 100644
--- a/using_doctr/using_datasets.html
+++ b/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -638,7 +638,7 @@ Data Loading
-
+
diff --git a/using_doctr/using_model_export.html b/using_doctr/using_model_export.html
index 2b30ee63a1..ad9d09ed4c 100644
--- a/using_doctr/using_model_export.html
+++ b/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -463,7 +463,7 @@ Using your ONNX exported model
-
+
diff --git a/using_doctr/using_models.html b/using_doctr/using_models.html
index 13cb06116b..5c80dbf62d 100644
--- a/using_doctr/using_models.html
+++ b/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1249,7 +1249,7 @@ Advanced options
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/cord.html b/v0.1.0/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.0/_modules/doctr/datasets/cord.html
+++ b/v0.1.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/detection.html b/v0.1.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.0/_modules/doctr/datasets/detection.html
+++ b/v0.1.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/doc_artefacts.html b/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/funsd.html b/v0.1.0/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.0/_modules/doctr/datasets/funsd.html
+++ b/v0.1.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ic03.html b/v0.1.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.0/_modules/doctr/datasets/ic03.html
+++ b/v0.1.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ic13.html b/v0.1.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.0/_modules/doctr/datasets/ic13.html
+++ b/v0.1.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/iiit5k.html b/v0.1.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/iiithws.html b/v0.1.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/imgur5k.html b/v0.1.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/loader.html b/v0.1.0/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.0/_modules/doctr/datasets/loader.html
+++ b/v0.1.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/mjsynth.html b/v0.1.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ocr.html b/v0.1.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.0/_modules/doctr/datasets/ocr.html
+++ b/v0.1.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/recognition.html b/v0.1.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.0/_modules/doctr/datasets/recognition.html
+++ b/v0.1.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/sroie.html b/v0.1.0/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.0/_modules/doctr/datasets/sroie.html
+++ b/v0.1.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/svhn.html b/v0.1.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.0/_modules/doctr/datasets/svhn.html
+++ b/v0.1.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/svt.html b/v0.1.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.0/_modules/doctr/datasets/svt.html
+++ b/v0.1.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/synthtext.html b/v0.1.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/utils.html b/v0.1.0/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.0/_modules/doctr/datasets/utils.html
+++ b/v0.1.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/wildreceipt.html b/v0.1.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.0/_modules/doctr/io/elements.html b/v0.1.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.0/_modules/doctr/io/elements.html
+++ b/v0.1.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.0/_modules/doctr/io/html.html b/v0.1.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.0/_modules/doctr/io/html.html
+++ b/v0.1.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.0/_modules/doctr/io/image/base.html b/v0.1.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.0/_modules/doctr/io/image/base.html
+++ b/v0.1.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.0/_modules/doctr/io/image/tensorflow.html b/v0.1.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/io/pdf.html b/v0.1.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.0/_modules/doctr/io/pdf.html
+++ b/v0.1.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.0/_modules/doctr/io/reader.html b/v0.1.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.0/_modules/doctr/io/reader.html
+++ b/v0.1.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/zoo.html b/v0.1.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/zoo.html b/v0.1.0/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/models/factory/hub.html b/v0.1.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.0/_modules/doctr/models/factory/hub.html
+++ b/v0.1.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/zoo.html b/v0.1.0/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/models/zoo.html b/v0.1.0/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.0/_modules/doctr/models/zoo.html
+++ b/v0.1.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/transforms/modules/base.html b/v0.1.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/utils/metrics.html b/v0.1.0/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.0/_modules/doctr/utils/metrics.html
+++ b/v0.1.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.0/_modules/doctr/utils/visualization.html b/v0.1.0/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.0/_modules/doctr/utils/visualization.html
+++ b/v0.1.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.0/_modules/index.html b/v0.1.0/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.0/_modules/index.html
+++ b/v0.1.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.0/_sources/getting_started/installing.rst.txt b/v0.1.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.0/_sources/getting_started/installing.rst.txt
+++ b/v0.1.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.0/_static/basic.css b/v0.1.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.0/_static/basic.css
+++ b/v0.1.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.0/_static/doctools.js b/v0.1.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.0/_static/doctools.js
+++ b/v0.1.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.0/_static/language_data.js b/v0.1.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.0/_static/language_data.js
+++ b/v0.1.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.0/_static/searchtools.js b/v0.1.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.0/_static/searchtools.js
+++ b/v0.1.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.0/changelog.html b/v0.1.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.0/changelog.html
+++ b/v0.1.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.0/community/resources.html b/v0.1.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.0/community/resources.html
+++ b/v0.1.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.0/contributing/code_of_conduct.html b/v0.1.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.0/contributing/code_of_conduct.html
+++ b/v0.1.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.0/contributing/contributing.html b/v0.1.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.0/contributing/contributing.html
+++ b/v0.1.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.0/genindex.html b/v0.1.0/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.0/genindex.html
+++ b/v0.1.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.0/getting_started/installing.html b/v0.1.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.0/getting_started/installing.html
+++ b/v0.1.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.0/index.html b/v0.1.0/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.0/index.html
+++ b/v0.1.0/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.0/modules/contrib.html b/v0.1.0/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.0/modules/contrib.html
+++ b/v0.1.0/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.0/modules/datasets.html b/v0.1.0/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.0/modules/datasets.html
+++ b/v0.1.0/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.0/modules/io.html b/v0.1.0/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.0/modules/io.html
+++ b/v0.1.0/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.0/modules/models.html b/v0.1.0/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.0/modules/models.html
+++ b/v0.1.0/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.0/modules/transforms.html b/v0.1.0/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.0/modules/transforms.html
+++ b/v0.1.0/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.0/modules/utils.html b/v0.1.0/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.0/modules/utils.html
+++ b/v0.1.0/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.0/notebooks.html b/v0.1.0/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.0/notebooks.html
+++ b/v0.1.0/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.0/search.html b/v0.1.0/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.0/search.html
+++ b/v0.1.0/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.0/searchindex.js b/v0.1.0/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.0/searchindex.js
+++ b/v0.1.0/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.0/using_doctr/custom_models_training.html b/v0.1.0/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.0/using_doctr/custom_models_training.html
+++ b/v0.1.0/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.0/using_doctr/running_on_aws.html b/v0.1.0/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.0/using_doctr/running_on_aws.html
+++ b/v0.1.0/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.0/using_doctr/sharing_models.html b/v0.1.0/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.0/using_doctr/sharing_models.html
+++ b/v0.1.0/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.0/using_doctr/using_contrib_modules.html b/v0.1.0/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.0/using_doctr/using_contrib_modules.html
+++ b/v0.1.0/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.0/using_doctr/using_datasets.html b/v0.1.0/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.0/using_doctr/using_datasets.html
+++ b/v0.1.0/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.0/using_doctr/using_model_export.html b/v0.1.0/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.0/using_doctr/using_model_export.html
+++ b/v0.1.0/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.0/using_doctr/using_models.html b/v0.1.0/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.0/using_doctr/using_models.html
+++ b/v0.1.0/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/cord.html b/v0.1.1/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.1/_modules/doctr/datasets/cord.html
+++ b/v0.1.1/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/detection.html b/v0.1.1/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.1/_modules/doctr/datasets/detection.html
+++ b/v0.1.1/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/funsd.html b/v0.1.1/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.1/_modules/doctr/datasets/funsd.html
+++ b/v0.1.1/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic03.html b/v0.1.1/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.1/_modules/doctr/datasets/ic03.html
+++ b/v0.1.1/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic13.html b/v0.1.1/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.1/_modules/doctr/datasets/ic13.html
+++ b/v0.1.1/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiit5k.html b/v0.1.1/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.1/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.1/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiithws.html b/v0.1.1/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.1/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.1/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/imgur5k.html b/v0.1.1/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.1/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.1/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/loader.html b/v0.1.1/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.1/_modules/doctr/datasets/loader.html
+++ b/v0.1.1/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/mjsynth.html b/v0.1.1/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.1/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.1/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ocr.html b/v0.1.1/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.1/_modules/doctr/datasets/ocr.html
+++ b/v0.1.1/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/recognition.html b/v0.1.1/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.1/_modules/doctr/datasets/recognition.html
+++ b/v0.1.1/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/sroie.html b/v0.1.1/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.1/_modules/doctr/datasets/sroie.html
+++ b/v0.1.1/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svhn.html b/v0.1.1/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.1/_modules/doctr/datasets/svhn.html
+++ b/v0.1.1/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svt.html b/v0.1.1/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.1/_modules/doctr/datasets/svt.html
+++ b/v0.1.1/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/synthtext.html b/v0.1.1/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.1/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.1/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/utils.html b/v0.1.1/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.1/_modules/doctr/datasets/utils.html
+++ b/v0.1.1/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/wildreceipt.html b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.1/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.1/_modules/doctr/io/elements.html b/v0.1.1/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.1/_modules/doctr/io/elements.html
+++ b/v0.1.1/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.1/_modules/doctr/io/html.html b/v0.1.1/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.1/_modules/doctr/io/html.html
+++ b/v0.1.1/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/base.html b/v0.1.1/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.1/_modules/doctr/io/image/base.html
+++ b/v0.1.1/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/tensorflow.html b/v0.1.1/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.1/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.1/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/io/pdf.html b/v0.1.1/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.1/_modules/doctr/io/pdf.html
+++ b/v0.1.1/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.1/_modules/doctr/io/reader.html b/v0.1.1/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.1/_modules/doctr/io/reader.html
+++ b/v0.1.1/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/zoo.html b/v0.1.1/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.1/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.1/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/zoo.html b/v0.1.1/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.1/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.1/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/factory/hub.html b/v0.1.1/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.1/_modules/doctr/models/factory/hub.html
+++ b/v0.1.1/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/zoo.html b/v0.1.1/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.1/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.1/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/zoo.html b/v0.1.1/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.1/_modules/doctr/models/zoo.html
+++ b/v0.1.1/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/base.html b/v0.1.1/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/utils/metrics.html b/v0.1.1/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.1/_modules/doctr/utils/metrics.html
+++ b/v0.1.1/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.1/_modules/doctr/utils/visualization.html b/v0.1.1/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.1/_modules/doctr/utils/visualization.html
+++ b/v0.1.1/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.1/_modules/index.html b/v0.1.1/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.1/_modules/index.html
+++ b/v0.1.1/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.1/_sources/getting_started/installing.rst.txt b/v0.1.1/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.1/_sources/getting_started/installing.rst.txt
+++ b/v0.1.1/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.1/_static/basic.css b/v0.1.1/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.1/_static/basic.css
+++ b/v0.1.1/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.1/_static/doctools.js b/v0.1.1/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.1/_static/doctools.js
+++ b/v0.1.1/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.1/_static/language_data.js b/v0.1.1/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.1/_static/language_data.js
+++ b/v0.1.1/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.1/_static/searchtools.js b/v0.1.1/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.1/_static/searchtools.js
+++ b/v0.1.1/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.1/changelog.html b/v0.1.1/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.1/changelog.html
+++ b/v0.1.1/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.1/community/resources.html b/v0.1.1/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.1/community/resources.html
+++ b/v0.1.1/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.1/contributing/code_of_conduct.html b/v0.1.1/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.1/contributing/code_of_conduct.html
+++ b/v0.1.1/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.1/contributing/contributing.html b/v0.1.1/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.1/contributing/contributing.html
+++ b/v0.1.1/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.1/genindex.html b/v0.1.1/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.1/genindex.html
+++ b/v0.1.1/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.1/getting_started/installing.html b/v0.1.1/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.1/getting_started/installing.html
+++ b/v0.1.1/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.1/index.html b/v0.1.1/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.1/index.html
+++ b/v0.1.1/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.1/modules/contrib.html b/v0.1.1/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.1/modules/contrib.html
+++ b/v0.1.1/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.1/modules/datasets.html b/v0.1.1/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.1/modules/datasets.html
+++ b/v0.1.1/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.1/modules/io.html b/v0.1.1/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.1/modules/io.html
+++ b/v0.1.1/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.1/modules/models.html b/v0.1.1/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.1/modules/models.html
+++ b/v0.1.1/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.1/modules/transforms.html b/v0.1.1/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.1/modules/transforms.html
+++ b/v0.1.1/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
AWS Lambda
-
+
diff --git a/latest/using_doctr/sharing_models.html b/latest/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/latest/using_doctr/sharing_models.html
+++ b/latest/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/latest/using_doctr/using_contrib_modules.html b/latest/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/latest/using_doctr/using_contrib_modules.html
+++ b/latest/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/latest/using_doctr/using_datasets.html b/latest/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/latest/using_doctr/using_datasets.html
+++ b/latest/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/latest/using_doctr/using_model_export.html b/latest/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/latest/using_doctr/using_model_export.html
+++ b/latest/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/latest/using_doctr/using_models.html b/latest/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/latest/using_doctr/using_models.html
+++ b/latest/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/modules/contrib.html b/modules/contrib.html
index 22b0c508a6..b8878635b6 100644
--- a/modules/contrib.html
+++ b/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -376,7 +376,7 @@ Supported contribution modules
-
+
diff --git a/modules/datasets.html b/modules/datasets.html
index 0fe4b78d48..dfcacbc96e 100644
--- a/modules/datasets.html
+++ b/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1077,7 +1077,7 @@ Returns:
-
+
diff --git a/modules/io.html b/modules/io.html
index 924d292c59..77e9e017bf 100644
--- a/modules/io.html
+++ b/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -756,7 +756,7 @@ Returns:¶
-
+
diff --git a/modules/models.html b/modules/models.html
index bf45d11a71..f4a9833365 100644
--- a/modules/models.html
+++ b/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1598,7 +1598,7 @@ Args:¶
-
+
diff --git a/modules/transforms.html b/modules/transforms.html
index 6d77d16e7b..bc254c867b 100644
--- a/modules/transforms.html
+++ b/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -831,7 +831,7 @@ Args:¶<
-
+
diff --git a/modules/utils.html b/modules/utils.html
index 3dd3ecbd96..6784d81f6f 100644
--- a/modules/utils.html
+++ b/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -711,7 +711,7 @@ Args:¶
-
+
diff --git a/notebooks.html b/notebooks.html
index f3ea994e49..647f73d4eb 100644
--- a/notebooks.html
+++ b/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -387,7 +387,7 @@ docTR Notebooks
-
+
diff --git a/search.html b/search.html
index f0693e2c97..0e0da5efb3 100644
--- a/search.html
+++ b/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -336,7 +336,7 @@
-
+
diff --git a/searchindex.js b/searchindex.js
index 8598997441..df18967072 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[1, "correction"]], "2. Warning": [[1, "warning"]], "3. Temporary Ban": [[1, "temporary-ban"]], "4. Permanent Ban": [[1, "permanent-ban"]], "AWS Lambda": [[13, null]], "Advanced options": [[18, "advanced-options"]], "Args:": [[6, "args"], [6, "id4"], [6, "id7"], [6, "id10"], [6, "id13"], [6, "id16"], [6, "id19"], [6, "id22"], [6, "id25"], [6, "id29"], [6, "id32"], [6, "id37"], [6, "id40"], [6, "id46"], [6, "id49"], [6, "id50"], [6, "id51"], [6, "id54"], [6, "id57"], [6, "id60"], [6, "id61"], [7, "args"], [7, "id2"], [7, "id3"], [7, "id4"], [7, "id5"], [7, "id6"], [7, "id7"], [7, "id10"], [7, "id12"], [7, "id14"], [7, "id16"], [7, "id20"], [7, "id24"], [7, "id28"], [8, "args"], [8, "id3"], [8, "id8"], [8, "id13"], [8, "id17"], [8, "id21"], [8, "id26"], [8, "id31"], [8, "id36"], [8, "id41"], [8, "id46"], [8, "id50"], [8, "id54"], [8, "id59"], [8, "id63"], [8, "id68"], [8, "id73"], [8, "id77"], [8, "id81"], [8, "id85"], [8, "id90"], [8, "id95"], [8, "id99"], [8, "id104"], [8, "id109"], [8, "id114"], [8, "id119"], [8, "id123"], [8, "id127"], [8, "id132"], [8, "id137"], [8, "id142"], [8, "id146"], [8, "id150"], [8, "id155"], [8, "id159"], [8, "id163"], [8, "id167"], [8, "id169"], [8, "id171"], [8, "id173"], [9, "args"], [9, "id1"], [9, "id2"], [9, "id3"], [9, "id4"], [9, "id5"], [9, "id6"], [9, "id7"], [9, "id8"], [9, "id9"], [9, "id10"], [9, "id11"], [9, "id12"], [9, "id13"], [9, "id14"], [9, "id15"], [9, "id16"], [9, "id17"], [9, "id18"], [9, "id19"], [10, "args"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"]], "Artefact": [[7, "artefact"]], "ArtefactDetection": [[15, "artefactdetection"]], "Attribution": [[1, "attribution"]], "Available Datasets": [[16, "available-datasets"]], "Available architectures": [[18, "available-architectures"], [18, "id1"], [18, "id2"]], "Available contribution modules": [[15, "available-contribution-modules"]], "Block": [[7, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[16, null]], "Choosing the right model": [[18, null]], "Classification": [[14, "classification"]], "Code quality": [[2, "code-quality"]], "Code style verification": [[2, "code-style-verification"]], "Codebase structure": [[2, "codebase-structure"]], "Commits": [[2, "commits"]], "Composing transformations": [[9, "composing-transformations"]], "Continuous Integration": [[2, "continuous-integration"]], "Contributing to docTR": [[2, null]], "Contributor Covenant Code of Conduct": [[1, null]], "Custom dataset loader": [[6, "custom-dataset-loader"]], "Custom orientation classification models": [[12, "custom-orientation-classification-models"]], "Data Loading": [[16, "data-loading"]], "Dataloader": [[6, "dataloader"]], "Detection": [[14, "detection"], [16, "detection"]], "Detection predictors": [[18, "detection-predictors"]], "Developer mode installation": [[2, "developer-mode-installation"]], "Developing docTR": [[2, "developing-doctr"]], "Document": [[7, "document"]], "Document structure": [[7, "document-structure"]], "End-to-End OCR": [[18, "end-to-end-ocr"]], "Enforcement": [[1, "enforcement"]], "Enforcement Guidelines": [[1, "enforcement-guidelines"]], "Enforcement Responsibilities": [[1, "enforcement-responsibilities"]], "Export to ONNX": [[17, "export-to-onnx"]], "Feature requests & bug report": [[2, "feature-requests-bug-report"]], "Feedback": [[2, "feedback"]], "File reading": [[7, "file-reading"]], "Half-precision": [[17, "half-precision"]], "Installation": [[3, null]], "Integrate contributions into your pipeline": [[15, null]], "Let\u2019s connect": [[2, "let-s-connect"]], "Line": [[7, "line"]], "Loading from Huggingface Hub": [[14, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[12, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[12, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[4, "main-features"]], "Model optimization": [[17, "model-optimization"]], "Model zoo": [[4, "model-zoo"]], "Modifying the documentation": [[2, "modifying-the-documentation"]], "Naming conventions": [[14, "naming-conventions"]], "OCR": [[16, "ocr"]], "Object Detection": [[16, "object-detection"]], "Our Pledge": [[1, "our-pledge"]], "Our Standards": [[1, "our-standards"]], "Page": [[7, "page"]], "Preparing your model for inference": [[17, null]], "Prerequisites": [[3, "prerequisites"]], "Pretrained community models": [[14, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[14, "pushing-to-the-huggingface-hub"]], "Questions": [[2, "questions"]], "Recognition": [[14, "recognition"], [16, "recognition"]], "Recognition predictors": [[18, "recognition-predictors"]], "Returns:": [[6, "returns"], [7, "returns"], [7, "id11"], [7, "id13"], [7, "id15"], [7, "id19"], [7, "id23"], [7, "id27"], [7, "id31"], [8, "returns"], [8, "id6"], [8, "id11"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id29"], [8, "id34"], [8, "id39"], [8, "id44"], [8, "id49"], [8, "id53"], [8, "id57"], [8, "id62"], [8, "id66"], [8, "id71"], [8, "id76"], [8, "id80"], [8, "id84"], [8, "id88"], [8, "id93"], [8, "id98"], [8, "id102"], [8, "id107"], [8, "id112"], [8, "id117"], [8, "id122"], [8, "id126"], [8, "id130"], [8, "id135"], [8, "id140"], [8, "id145"], [8, "id149"], [8, "id153"], [8, "id158"], [8, "id162"], [8, "id166"], [8, "id168"], [8, "id170"], [8, "id172"], [10, "returns"]], "Scope": [[1, "scope"]], "Share your model with the community": [[14, null]], "Supported Vocabs": [[6, "supported-vocabs"]], "Supported contribution modules": [[5, "supported-contribution-modules"]], "Supported datasets": [[4, "supported-datasets"]], "Supported transformations": [[9, "supported-transformations"]], "Synthetic dataset generator": [[6, "synthetic-dataset-generator"], [16, "synthetic-dataset-generator"]], "Task evaluation": [[10, "task-evaluation"]], "Text Detection": [[18, "text-detection"]], "Text Recognition": [[18, "text-recognition"]], "Text detection models": [[4, "text-detection-models"]], "Text recognition models": [[4, "text-recognition-models"]], "Train your own model": [[12, null]], "Two-stage approaches": [[18, "two-stage-approaches"]], "Unit tests": [[2, "unit-tests"]], "Use your own datasets": [[16, "use-your-own-datasets"]], "Using your ONNX exported model": [[17, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[3, "via-conda-only-for-linux"]], "Via Git": [[3, "via-git"]], "Via Python Package": [[3, "via-python-package"]], "Visualization": [[10, "visualization"]], "What should I do with the output?": [[18, "what-should-i-do-with-the-output"]], "Word": [[7, "word"]], "docTR Notebooks": [[11, null]], "docTR Vocabs": [[6, "id62"]], "docTR: Document Text Recognition": [[4, null]], "doctr.contrib": [[5, null]], "doctr.datasets": [[6, null], [6, "datasets"]], "doctr.io": [[7, null]], "doctr.models": [[8, null]], "doctr.models.classification": [[8, "doctr-models-classification"]], "doctr.models.detection": [[8, "doctr-models-detection"]], "doctr.models.factory": [[8, "doctr-models-factory"]], "doctr.models.recognition": [[8, "doctr-models-recognition"]], "doctr.models.zoo": [[8, "doctr-models-zoo"]], "doctr.transforms": [[9, null]], "doctr.utils": [[10, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[7, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[7, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[9, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[6, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[9, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[9, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[6, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[6, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[8, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[6, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[6, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[7, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[7, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[6, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[6, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[9, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[9, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[6, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[6, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[6, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[6, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[6, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[8, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[9, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[7, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[6, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[9, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[8, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[6, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[9, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[7, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[9, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[9, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[9, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[9, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[9, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[9, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[9, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[9, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[9, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[9, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[9, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[9, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[7, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[7, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[7, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[6, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[9, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[7, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[7, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[6, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[6, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[6, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[6, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[9, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[10, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[6, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[7, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[6, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[6, 0, 1, "", "CORD"], [6, 0, 1, "", "CharacterGenerator"], [6, 0, 1, "", "DetectionDataset"], [6, 0, 1, "", "DocArtefacts"], [6, 0, 1, "", "FUNSD"], [6, 0, 1, "", "IC03"], [6, 0, 1, "", "IC13"], [6, 0, 1, "", "IIIT5K"], [6, 0, 1, "", "IIITHWS"], [6, 0, 1, "", "IMGUR5K"], [6, 0, 1, "", "MJSynth"], [6, 0, 1, "", "OCRDataset"], [6, 0, 1, "", "RecognitionDataset"], [6, 0, 1, "", "SROIE"], [6, 0, 1, "", "SVHN"], [6, 0, 1, "", "SVT"], [6, 0, 1, "", "SynthText"], [6, 0, 1, "", "WILDRECEIPT"], [6, 0, 1, "", "WordGenerator"], [6, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[6, 0, 1, "", "DataLoader"]], "doctr.io": [[7, 0, 1, "", "Artefact"], [7, 0, 1, "", "Block"], [7, 0, 1, "", "Document"], [7, 0, 1, "", "DocumentFile"], [7, 0, 1, "", "Line"], [7, 0, 1, "", "Page"], [7, 0, 1, "", "Word"], [7, 1, 1, "", "decode_img_as_tensor"], [7, 1, 1, "", "read_html"], [7, 1, 1, "", "read_img_as_numpy"], [7, 1, 1, "", "read_img_as_tensor"], [7, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[7, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[7, 2, 1, "", "from_images"], [7, 2, 1, "", "from_pdf"], [7, 2, 1, "", "from_url"]], "doctr.io.Page": [[7, 2, 1, "", "show"]], "doctr.models": [[8, 1, 1, "", "kie_predictor"], [8, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[8, 1, 1, "", "crop_orientation_predictor"], [8, 1, 1, "", "magc_resnet31"], [8, 1, 1, "", "mobilenet_v3_large"], [8, 1, 1, "", "mobilenet_v3_large_r"], [8, 1, 1, "", "mobilenet_v3_small"], [8, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [8, 1, 1, "", "mobilenet_v3_small_page_orientation"], [8, 1, 1, "", "mobilenet_v3_small_r"], [8, 1, 1, "", "page_orientation_predictor"], [8, 1, 1, "", "resnet18"], [8, 1, 1, "", "resnet31"], [8, 1, 1, "", "resnet34"], [8, 1, 1, "", "resnet50"], [8, 1, 1, "", "textnet_base"], [8, 1, 1, "", "textnet_small"], [8, 1, 1, "", "textnet_tiny"], [8, 1, 1, "", "vgg16_bn_r"], [8, 1, 1, "", "vit_b"], [8, 1, 1, "", "vit_s"]], "doctr.models.detection": [[8, 1, 1, "", "db_mobilenet_v3_large"], [8, 1, 1, "", "db_resnet50"], [8, 1, 1, "", "detection_predictor"], [8, 1, 1, "", "fast_base"], [8, 1, 1, "", "fast_small"], [8, 1, 1, "", "fast_tiny"], [8, 1, 1, "", "linknet_resnet18"], [8, 1, 1, "", "linknet_resnet34"], [8, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[8, 1, 1, "", "from_hub"], [8, 1, 1, "", "login_to_hub"], [8, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[8, 1, 1, "", "crnn_mobilenet_v3_large"], [8, 1, 1, "", "crnn_mobilenet_v3_small"], [8, 1, 1, "", "crnn_vgg16_bn"], [8, 1, 1, "", "master"], [8, 1, 1, "", "parseq"], [8, 1, 1, "", "recognition_predictor"], [8, 1, 1, "", "sar_resnet31"], [8, 1, 1, "", "vitstr_base"], [8, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[9, 0, 1, "", "ChannelShuffle"], [9, 0, 1, "", "ColorInversion"], [9, 0, 1, "", "Compose"], [9, 0, 1, "", "GaussianBlur"], [9, 0, 1, "", "GaussianNoise"], [9, 0, 1, "", "LambdaTransformation"], [9, 0, 1, "", "Normalize"], [9, 0, 1, "", "OneOf"], [9, 0, 1, "", "RandomApply"], [9, 0, 1, "", "RandomBrightness"], [9, 0, 1, "", "RandomContrast"], [9, 0, 1, "", "RandomCrop"], [9, 0, 1, "", "RandomGamma"], [9, 0, 1, "", "RandomHorizontalFlip"], [9, 0, 1, "", "RandomHue"], [9, 0, 1, "", "RandomJpegQuality"], [9, 0, 1, "", "RandomResize"], [9, 0, 1, "", "RandomRotate"], [9, 0, 1, "", "RandomSaturation"], [9, 0, 1, "", "RandomShadow"], [9, 0, 1, "", "Resize"], [9, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[10, 0, 1, "", "DetectionMetric"], [10, 0, 1, "", "LocalizationConfusion"], [10, 0, 1, "", "OCRMetric"], [10, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.visualization": [[10, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [1, 7, 8, 10, 14, 17], "0": [1, 3, 6, 9, 10, 12, 15, 16, 18], "00": 18, "01": 18, "0123456789": 6, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "02562": 8, "03": 18, "035": 18, "0361328125": 18, "04": 18, "05": 18, "06": 18, "06640625": 18, "07": 18, "08": [9, 18], "09": 18, "0966796875": 18, "1": [6, 7, 8, 9, 10, 12, 16, 18], "10": [6, 10, 18], "100": [6, 9, 10, 16, 18], "1000": 18, "101": 6, "1024": [8, 12, 18], "104": 6, "106": 6, "108": 6, "1095": 16, "11": 18, "110": 10, "1107": 16, "114": 6, "115": 6, "1156": 16, "116": 6, "118": 6, "11800h": 18, "11th": 18, "12": 18, "120": 6, "123": 6, "126": 6, "1268": 16, "128": [8, 12, 17, 18], "13": 18, "130": 6, "13068": 16, "131": 6, "1337891": 16, "1357421875": 18, "1396484375": 18, "14": 18, "1420": 18, "14470v1": 6, "149": 16, "15": 18, "150": [10, 18], "1552": 18, "16": [8, 17, 18], "1630859375": 18, "1684": 18, "16x16": 8, "17": 18, "1778": 18, "1782": 18, "18": [8, 18], "185546875": 18, "1900": 18, "1910": 8, "19342": 16, "19370": 16, "195": 6, "19598": 16, "199": 18, "1999": 18, "2": [3, 4, 6, 7, 9, 15, 18], "20": 18, "200": 10, "2000": 16, "2003": [4, 6], "2012": 6, "2013": [4, 6], "2015": 6, "2019": 4, "207901": 16, "21": 18, "2103": 6, "2186": 16, "21888": 16, "22": 18, "224": [8, 9], "225": 9, "22672": 16, "229": [9, 16], "23": 18, "233": 16, "236": 6, "24": 18, "246": 16, "249": 16, "25": 18, "2504": 18, "255": [7, 8, 9, 10, 18], "256": 8, "257": 16, "26": 18, "26032": 16, "264": 12, "27": 18, "2700": 16, "2710": 18, "2749": 12, "28": 18, "287": 12, "29": 18, "296": 12, "299": 12, "2d": 18, "3": [3, 4, 7, 8, 9, 10, 17, 18], "30": 18, "300": 16, "3000": 16, "301": 12, "30595": 18, "30ghz": 18, "31": 8, "32": [6, 8, 9, 12, 16, 17, 18], "3232421875": 18, "33": [9, 18], "33402": 16, "33608": 16, "34": [8, 18], "340": 18, "3456": 18, "3515625": 18, "36": 18, "360": 16, "37": [6, 18], "38": 18, "39": 18, "4": [8, 9, 10, 18], "40": 18, "406": 9, "41": 18, "42": 18, "43": 18, "44": 18, "45": 18, "456": 9, "46": 18, "47": 18, "472": 16, "48": [6, 18], "485": 9, "49": 18, "49377": 16, "5": [6, 9, 10, 15, 18], "50": [8, 16, 18], "51": 18, "51171875": 18, "512": 8, "52": [6, 18], "529": 18, "53": 18, "54": 18, "540": 18, "5478515625": 18, "55": 18, "56": 18, "57": 18, "58": [6, 18], "580": 18, "5810546875": 18, "583": 18, "59": 18, "597": 18, "5k": [4, 6], "5m": 18, "6": [9, 18], "60": 9, "600": [8, 10, 18], "61": 18, "62": 18, "626": 16, "63": 18, "64": [8, 9, 18], "641": 18, "647": 16, "65": 18, "66": 18, "67": 18, "68": 18, "69": 18, "693": 12, "694": 12, "695": 12, "6m": 18, "7": 18, "70": [6, 10, 18], "707470": 16, "71": [6, 18], "7100000": 16, "7141797": 16, "7149": 16, "72": 18, "72dpi": 7, "73": 18, "73257": 16, "74": 18, "75": [9, 18], "7581382": 16, "76": 18, "77": 18, "772": 12, "772875": 16, "78": 18, "785": 12, "79": 18, "793533": 16, "796": 16, "798": 12, "7m": 18, "8": [8, 9, 18], "80": 18, "800": [8, 10, 16, 18], "81": 18, "82": 18, "83": 18, "84": 18, "849": 16, "85": 18, "8564453125": 18, "857": 18, "85875": 16, "86": 18, "8603515625": 18, "87": 18, "8707": 16, "88": 18, "89": 18, "9": [3, 9, 18], "90": 18, "90k": 6, "90kdict32px": 6, "91": 18, "914085328578949": 18, "92": 18, "93": 18, "94": [6, 18], "95": [10, 18], "9578408598899841": 18, "96": 18, "97": 18, "98": 18, "99": 18, "9949972033500671": 18, "A": [1, 2, 4, 6, 7, 8, 11, 17], "As": 2, "Be": 18, "Being": 1, "By": 13, "For": [1, 2, 3, 12, 18], "If": [2, 7, 8, 12, 18], "In": [2, 6, 16], "It": [9, 14, 15, 17], "Its": [4, 8], "No": [1, 18], "Of": 6, "Or": [15, 17], "The": [1, 2, 6, 7, 10, 13, 15, 16, 17, 18], "Then": 8, "To": [2, 3, 13, 14, 15, 17, 18], "_": [1, 6, 8], "__call__": 18, "_build": 2, "_i": 10, "ab": 6, "abc": 17, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "abdef": [6, 16], "abl": [16, 18], "about": [1, 16, 18], "abov": 18, "abstractdataset": 6, "abus": 1, "accept": 1, "access": [4, 7, 16, 18], "account": [1, 14], "accur": 18, "accuraci": 10, "achiev": 17, "act": 1, "action": 1, "activ": 4, "ad": [2, 8, 9], "adapt": 1, "add": [9, 10, 14, 18], "add_hook": 18, "add_label": 10, "addit": [2, 3, 7, 15, 18], "addition": [2, 18], "address": [1, 7], "adjust": 9, "advanc": 1, "advantag": 17, "advis": 2, "aesthet": [4, 6], "affect": 1, "after": [14, 18], "ag": 1, "again": 8, "aggreg": [10, 16], "aggress": 1, "align": [1, 7, 9], "all": [1, 2, 5, 6, 7, 9, 10, 15, 16, 18], "allow": [1, 17], "along": 18, "alreadi": [2, 17], "also": [1, 8, 14, 15, 16, 18], "alwai": 16, "an": [1, 2, 4, 6, 7, 8, 10, 15, 17, 18], "analysi": [7, 15], "ancient_greek": 6, "angl": [7, 9], "ani": [1, 6, 7, 8, 9, 10, 17, 18], "annot": 6, "anot": 16, "anoth": [8, 12, 16], "answer": 1, "anyascii": 10, "anyon": 4, "anyth": 15, "api": [2, 4], "apolog": 1, "apologi": 1, "app": 2, "appear": 1, "appli": [1, 6, 9], "applic": [4, 8], "appoint": 1, "appreci": 14, "appropri": [1, 2, 18], "ar": [1, 2, 3, 5, 6, 7, 9, 10, 11, 15, 16, 18], "arab": 6, "arabic_diacrit": 6, "arabic_lett": 6, "arabic_punctu": 6, "arbitrarili": [4, 8], "arch": [8, 14], "architectur": [4, 8, 14, 15], "area": 18, "argument": [6, 7, 8, 10, 12, 18], "around": 1, "arrai": [7, 9, 10], "art": [4, 15], "artefact": [10, 15, 18], "artefact_typ": 7, "artifici": [4, 6], "arxiv": [6, 8], "asarrai": 10, "ascii_lett": 6, "aspect": [4, 8, 9, 18], "assess": 10, "assign": 10, "associ": 7, "assum": 8, "assume_straight_pag": [8, 12, 18], "astyp": [8, 10, 18], "attack": 1, "attend": [4, 8], "attent": [1, 8], "autom": 4, "automat": 18, "autoregress": [4, 8], "avail": [1, 4, 5, 9], "averag": [9, 18], "avoid": [1, 3], "aw": [4, 18], "awar": 18, "azur": 18, "b": [8, 10, 18], "b_j": 10, "back": 2, "backbon": 8, "backend": 18, "background": 16, "bangla": 6, "bar": 15, "bar_cod": 16, "base": [4, 8, 15], "baselin": [4, 8, 18], "batch": [6, 8, 9, 15, 16, 18], "batch_siz": [6, 12, 15, 16, 17], "bblanchon": 3, "bbox": 18, "becaus": 13, "been": [2, 10, 16, 18], "befor": [6, 8, 9, 18], "begin": 10, "behavior": [1, 18], "being": [10, 18], "belong": 18, "benchmark": 18, "best": 1, "better": [11, 18], "between": [9, 10, 18], "bgr": 7, "bilinear": 9, "bin_thresh": 18, "binar": [4, 8, 18], "binari": [7, 17, 18], "bit": 17, "block": [10, 18], "block_1_1": 18, "blur": 9, "bmvc": 6, "bn": 14, "bodi": [1, 18], "bool": [6, 7, 8, 9, 10], "boolean": [8, 18], "both": [4, 6, 9, 16, 18], "bottom": [8, 18], "bound": [6, 7, 8, 9, 10, 15, 16, 18], "box": [6, 7, 8, 9, 10, 15, 16, 18], "box_thresh": 18, "bright": 9, "browser": [2, 4], "build": [2, 3, 17], "built": 2, "byte": [7, 18], "c": [3, 7, 10], "c_j": 10, "cach": [2, 6, 13], "cache_sampl": 6, "call": 17, "callabl": [6, 9], "can": [2, 3, 12, 13, 14, 15, 16, 18], "capabl": [2, 11, 18], "case": [6, 10], "cf": 18, "cfg": 18, "challeng": 6, "challenge2_test_task12_imag": 6, "challenge2_test_task1_gt": 6, "challenge2_training_task12_imag": 6, "challenge2_training_task1_gt": 6, "chang": [13, 18], "channel": [1, 2, 7, 9], "channel_prior": 3, "channelshuffl": 9, "charact": [4, 6, 7, 10, 16, 18], "charactergener": [6, 16], "characterist": 1, "charg": 18, "charset": 18, "chart": 7, "check": [2, 14, 18], "checkpoint": 8, "chip": 3, "ci": 2, "clarifi": 1, "clariti": 1, "class": [1, 6, 7, 9, 10, 18], "class_nam": 12, "classif": [16, 18], "classmethod": 7, "clear": 2, "clone": 3, "close": 2, "co": 14, "code": [4, 7, 15], "codecov": 2, "colab": 11, "collate_fn": 6, "collect": [7, 15], "color": 9, "colorinvers": 9, "column": 7, "com": [1, 3, 7, 8, 14], "combin": 18, "command": [2, 15], "comment": 1, "commit": 1, "common": [1, 9, 10, 17], "commun": 1, "compar": 4, "comparison": [10, 18], "competit": 6, "compil": [11, 18], "complaint": 1, "complementari": 10, "complet": 2, "compon": 18, "compos": [6, 18], "comprehens": 18, "comput": [6, 10, 17, 18], "conf_threshold": 15, "confid": [7, 18], "config": [3, 8], "configur": 8, "confus": 10, "consecut": [9, 18], "consequ": 1, "consid": [1, 2, 6, 7, 10, 18], "consist": 18, "consolid": [4, 6], "constant": 9, "construct": 1, "contact": 1, "contain": [5, 6, 11, 16, 18], "content": [6, 7, 18], "context": 8, "contib": 3, "continu": 1, "contrast": 9, "contrast_factor": 9, "contrib": [3, 15], "contribut": 1, "contributor": 2, "convers": 7, "convert": [7, 9], "convolut": 8, "coordin": [7, 18], "cord": [4, 6, 16, 18], "core": [10, 18], "corner": 18, "correct": 9, "correspond": [3, 7, 9, 18], "could": [1, 15], "counterpart": 10, "cover": 2, "coverag": 2, "cpu": [4, 12, 17], "creat": 14, "crnn": [4, 8, 14], "crnn_mobilenet_v3_larg": [8, 14, 18], "crnn_mobilenet_v3_smal": [8, 17, 18], "crnn_vgg16_bn": [8, 12, 14, 18], "crop": [7, 8, 9, 12, 16, 18], "crop_orient": [7, 18], "crop_orientation_predictor": [8, 12], "crop_param": 12, "cuda": 17, "currenc": 6, "current": [2, 12, 18], "custom": [14, 15, 17, 18], "custom_crop_orientation_model": 12, "custom_page_orientation_model": 12, "customhook": 18, "cvit": 4, "czczup": 8, "czech": 6, "d": [6, 16], "danish": 6, "data": [4, 6, 7, 9, 10, 12, 14], "dataload": 16, "dataset": [8, 12, 18], "dataset_info": 6, "date": [12, 18], "db": 14, "db_mobilenet_v3_larg": [8, 14, 18], "db_resnet34": 18, "db_resnet50": [8, 12, 14, 18], "dbnet": [4, 8], "deal": [11, 18], "decis": 1, "decod": 7, "decode_img_as_tensor": 7, "dedic": 17, "deem": 1, "deep": [8, 18], "def": 18, "default": [3, 7, 12, 13, 18], "defer": 16, "defin": [10, 17], "degre": [7, 9, 18], "degress": 7, "delet": 2, "delimit": 18, "delta": 9, "demo": [2, 4], "demonstr": 1, "depend": [2, 3, 4, 18], "deploi": 2, "deploy": 4, "derogatori": 1, "describ": 8, "descript": 11, "design": 9, "desir": 7, "det_arch": [8, 12, 14, 17], "det_b": 18, "det_model": [12, 14, 17], "det_param": 12, "det_predictor": [12, 18], "detail": [12, 18], "detect": [6, 7, 10, 11, 12, 15], "detect_languag": 8, "detect_orient": [8, 12, 18], "detection_predictor": [8, 18], "detection_task": [6, 16], "detectiondataset": [6, 16], "detectionmetr": 10, "detectionpredictor": [8, 12], "detector": [4, 8, 15], "deterior": 8, "determin": 1, "dev": [2, 13], "develop": 3, "deviat": 9, "devic": 17, "dict": [7, 10, 18], "dictionari": [7, 10], "differ": 1, "differenti": [4, 8], "digit": [4, 6, 16], "dimens": [7, 10, 18], "dimension": 9, "direct": 6, "directli": [14, 18], "directori": [2, 13], "disabl": [1, 13, 18], "disable_crop_orient": 18, "disable_page_orient": 18, "disclaim": 18, "discuss": 2, "disparag": 1, "displai": [7, 10], "display_artefact": 10, "distribut": 9, "div": 18, "divers": 1, "divid": 7, "do": [2, 3, 8], "doc": [2, 7, 15, 17, 18], "docartefact": [6, 16], "docstr": 2, "doctr": [3, 12, 13, 14, 15, 16, 17, 18], "doctr_cache_dir": 13, "doctr_multiprocessing_dis": 13, "document": [6, 8, 10, 11, 12, 15, 16, 17, 18], "documentbuild": 18, "documentfil": [7, 12, 14, 15, 17], "doesn": 17, "don": [12, 18], "done": 9, "download": [6, 16], "downsiz": 8, "draw": 9, "drop": 6, "drop_last": 6, "dtype": [7, 8, 9, 10, 17], "dual": [4, 6], "dummi": 14, "dummy_img": 18, "dummy_input": 17, "dure": 1, "dutch": 6, "dynam": [6, 15], "dynamic_seq_length": 6, "e": [1, 2, 3, 7, 8], "each": [4, 6, 7, 8, 9, 10, 16, 18], "eas": 2, "easi": [4, 10, 14, 17], "easili": [7, 10, 12, 14, 16, 18], "econom": 1, "edit": 1, "educ": 1, "effect": 18, "effici": [2, 4, 6, 8], "either": [10, 18], "element": [6, 7, 8, 18], "els": [2, 15], "email": 1, "empathi": 1, "en": 18, "enabl": [6, 7], "enclos": 7, "encod": [4, 6, 7, 8, 18], "encode_sequ": 6, "encount": 2, "encrypt": 7, "end": [4, 6, 8, 10], "english": [6, 16], "enough": [2, 18], "ensur": 2, "entri": 6, "environ": [1, 13], "eo": 6, "equiv": 18, "estim": 8, "etc": [7, 15], "ethnic": 1, "evalu": [16, 18], "event": 1, "everyon": 1, "everyth": [2, 18], "exact": [10, 18], "exampl": [1, 2, 4, 6, 8, 14, 18], "exchang": 17, "execut": 18, "exist": 14, "expand": 9, "expect": [7, 9, 10], "experi": 1, "explan": [1, 18], "explicit": 1, "exploit": [4, 8], "export": [7, 8, 10, 11, 15, 18], "export_as_straight_box": [8, 18], "export_as_xml": 18, "export_model_to_onnx": 17, "express": [1, 9], "extens": 7, "extern": [1, 16], "extract": [4, 6], "extractor": 8, "f_": 10, "f_a": 10, "factor": 9, "fair": 1, "fairli": 1, "fals": [6, 7, 8, 9, 10, 12, 18], "faq": 1, "fascan": 14, "fast": [4, 6, 8], "fast_bas": [8, 18], "fast_smal": [8, 18], "fast_tini": [8, 18], "faster": [4, 8, 17], "fasterrcnn_mobilenet_v3_large_fpn": 8, "favorit": 18, "featur": [3, 8, 10, 11, 12, 15], "feedback": 1, "feel": [2, 14], "felix92": 14, "few": [17, 18], "figsiz": 10, "figur": [10, 15], "file": [2, 6], "final": 8, "find": [2, 16], "finnish": 6, "first": [2, 6], "firsthand": 6, "fit": [8, 18], "flag": 18, "flip": 9, "float": [7, 9, 10, 17], "float32": [7, 8, 9, 17], "fn": 9, "focu": 14, "focus": [1, 6], "folder": 6, "follow": [1, 2, 3, 6, 9, 10, 12, 13, 14, 15, 18], "font": 6, "font_famili": 6, "foral": 10, "forc": 2, "forg": 3, "form": [4, 6, 18], "format": [7, 10, 12, 16, 17, 18], "forpost": [4, 6], "forum": 2, "fp16": 17, "frac": 10, "framework": [3, 14, 16, 18], "free": [1, 2, 14], "french": [6, 12, 14, 18], "friendli": 4, "from": [1, 4, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18], "from_hub": [8, 14], "from_imag": [7, 14, 15, 17], "from_pdf": 7, "from_url": 7, "full": [6, 10, 18], "function": [6, 9, 10, 15], "funsd": [4, 6, 16, 18], "further": 16, "futur": 6, "g": [7, 8], "g_": 10, "g_x": 10, "gamma": 9, "gaussian": 9, "gaussianblur": 9, "gaussiannois": 9, "gen": 18, "gender": 1, "gener": [2, 4, 7, 8], "generic_cyrillic_lett": 6, "geometri": [4, 7, 18], "geq": 10, "german": [6, 12, 14], "get": [17, 18], "git": 14, "github": [2, 3, 8, 14], "give": [1, 15], "given": [6, 7, 9, 10, 18], "global": 8, "go": 18, "good": 17, "googl": 2, "googlevis": 4, "gpu": [4, 15, 17], "gracefulli": 1, "graph": [4, 6, 7], "grayscal": 9, "ground": 10, "groung": 10, "group": [4, 18], "gt": 10, "gt_box": 10, "gt_label": 10, "guid": 2, "guidanc": 16, "gvision": 18, "h": [7, 8, 9], "h_": 10, "ha": [2, 6, 10, 16], "handl": [11, 16, 18], "handwrit": 6, "handwritten": 16, "harass": 1, "hardwar": 18, "harm": 1, "hat": 10, "have": [1, 2, 10, 12, 14, 16, 17, 18], "head": [8, 18], "healthi": 1, "hebrew": 6, "height": [7, 9], "hello": [10, 18], "help": 17, "here": [5, 9, 11, 15, 16, 18], "hf": 8, "hf_hub_download": 8, "high": 7, "higher": [3, 6, 18], "hindi": 6, "hindi_digit": 6, "hocr": 18, "hook": 18, "horizont": [7, 9, 18], "hous": 6, "how": [2, 11, 12, 14, 16], "howev": 16, "hsv": 9, "html": [1, 2, 3, 7, 18], "http": [1, 3, 6, 7, 8, 14, 18], "hub": 8, "hue": 9, "huggingfac": 8, "hw": 6, "i": [1, 2, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17], "i7": 18, "ic03": [4, 6, 16], "ic13": [4, 6, 16], "icdar": [4, 6], "icdar2019": 6, "id": 18, "ident": 1, "identifi": 4, "iiit": [4, 6], "iiit5k": [6, 16], "iiithw": [4, 6, 16], "imag": [4, 6, 7, 8, 9, 10, 14, 15, 16, 18], "imagenet": 8, "imageri": 1, "images_90k_norm": 6, "img": [6, 9, 16, 17], "img_cont": 7, "img_fold": [6, 16], "img_path": 7, "img_transform": 6, "imgur5k": [4, 6, 16], "imgur5k_annot": 6, "imlist": 6, "impact": 1, "implement": [6, 7, 8, 9, 10, 18], "import": [6, 7, 8, 9, 10, 12, 14, 15, 16, 17, 18], "improv": 8, "inappropri": 1, "incid": 1, "includ": [1, 6, 16, 17], "inclus": 1, "increas": 9, "independ": 9, "index": [2, 7], "indic": 10, "individu": 1, "infer": [4, 8, 9, 15, 18], "inform": [1, 2, 4, 6, 16], "input": [2, 7, 8, 9, 17, 18], "input_crop": 8, "input_pag": [8, 10, 18], "input_shap": 17, "input_tensor": 8, "inspir": [1, 9], "instal": [14, 15, 17], "instanc": [1, 18], "instanti": [8, 18], "instead": [6, 7, 8], "insult": 1, "int": [6, 7, 9], "int64": 10, "integ": 10, "integr": [4, 14, 16], "intel": 18, "interact": [1, 7, 10], "interfac": [14, 17], "interoper": 17, "interpol": 9, "interpret": [6, 7], "intersect": 10, "invert": 9, "investig": 1, "invis": 1, "involv": [1, 18], "io": [12, 14, 15, 17], "iou": 10, "iou_thresh": 10, "iou_threshold": 15, "irregular": [4, 8, 16], "isn": 6, "issu": [1, 2, 14], "italian": 6, "iter": [6, 9, 16, 18], "its": [7, 8, 9, 10, 16, 18], "itself": [8, 14], "j": 10, "job": 2, "join": 2, "jpeg": 9, "jpegqual": 9, "jpg": [6, 7, 14, 17], "json": [6, 16, 18], "json_output": 18, "jump": 2, "just": 1, "kei": [4, 6], "kera": [8, 17], "kernel": [4, 8, 9], "kernel_shap": 9, "keywoard": 8, "keyword": [6, 7, 8, 10], "kie": [8, 12], "kie_predictor": [8, 12], "kiepredictor": 8, "kind": 1, "know": [2, 17], "kwarg": [6, 7, 8, 10], "l": 10, "l_j": 10, "label": [6, 10, 15, 16], "label_fil": [6, 16], "label_fold": 6, "label_path": [6, 16], "labels_path": [6, 16], "ladder": 1, "lambda": 9, "lambdatransform": 9, "lang": 18, "languag": [1, 4, 6, 7, 8, 14, 18], "larg": [8, 14], "largest": 10, "last": [3, 6], "latenc": 8, "later": 2, "latest": 18, "latin": 6, "layer": 17, "layout": 18, "lead": 1, "leader": 1, "learn": [1, 4, 8, 17, 18], "least": 3, "left": [10, 18], "legacy_french": 6, "length": [6, 18], "less": [17, 18], "level": [1, 6, 10, 18], "leverag": 11, "lf": 14, "librari": [2, 3, 11, 12], "light": 4, "lightweight": 17, "like": 1, "limits_": 10, "line": [4, 8, 10, 18], "line_1_1": 18, "link": 12, "linknet": [4, 8], "linknet_resnet18": [8, 12, 17, 18], "linknet_resnet34": [8, 17, 18], "linknet_resnet50": [8, 18], "list": [6, 7, 9, 10, 14], "ll": 10, "load": [4, 6, 8, 15, 17], "load_state_dict": 12, "load_weight": 12, "loc_pr": 18, "local": [2, 4, 6, 8, 10, 16, 18], "localis": 6, "localizationconfus": 10, "locat": [2, 7, 18], "login": 8, "login_to_hub": [8, 14], "logo": [7, 15, 16], "love": 14, "lower": [9, 10, 18], "m": [2, 10, 18], "m1": 3, "macbook": 3, "machin": 17, "made": 4, "magc_resnet31": 8, "mai": [1, 2], "mail": 1, "main": 11, "maintain": 4, "mainten": 2, "make": [1, 2, 10, 12, 13, 14, 17, 18], "mani": [16, 18], "manipul": 18, "map": [6, 8], "map_loc": 12, "master": [4, 8, 18], "match": [10, 18], "mathcal": 10, "matplotlib": [7, 10], "max": [6, 9, 10], "max_angl": 9, "max_area": 9, "max_char": [6, 16], "max_delta": 9, "max_gain": 9, "max_gamma": 9, "max_qual": 9, "max_ratio": 9, "maximum": [6, 9], "maxval": [8, 9], "mbox": 10, "mean": [9, 10, 12], "meaniou": 10, "meant": [7, 17], "measur": 18, "media": 1, "median": 8, "meet": 12, "member": 1, "memori": [13, 17], "mention": 18, "merg": 6, "messag": 2, "meta": 18, "metadata": 17, "metal": 3, "method": [7, 9, 18], "metric": [10, 18], "middl": 18, "might": [17, 18], "min": 9, "min_area": 9, "min_char": [6, 16], "min_gain": 9, "min_gamma": 9, "min_qual": 9, "min_ratio": 9, "min_val": 9, "minde": [1, 3, 4, 8], "minim": [2, 4], "minimalist": [4, 8], "minimum": [3, 6, 9, 10, 18], "minval": 9, "miss": 3, "mistak": 1, "mixed_float16": 17, "mixed_precis": 17, "mjsynth": [4, 6, 16], "mnt": 6, "mobilenet": [8, 14], "mobilenet_v3_larg": 8, "mobilenet_v3_large_r": 8, "mobilenet_v3_smal": [8, 12], "mobilenet_v3_small_crop_orient": [8, 12], "mobilenet_v3_small_page_orient": [8, 12], "mobilenet_v3_small_r": 8, "mobilenetv3": 8, "modal": [4, 6], "mode": 3, "model": [6, 10, 13, 15, 16], "model_nam": [8, 14, 17], "model_path": [15, 17], "moder": 1, "modif": 2, "modifi": [8, 13, 18], "modul": [3, 7, 8, 9, 10, 18], "more": [2, 16, 18], "most": 18, "mozilla": 1, "multi": [4, 8], "multilingu": [6, 14], "multipl": [6, 7, 9, 18], "multipli": 9, "multiprocess": 13, "my": 8, "my_awesome_model": 14, "my_hook": 18, "n": [6, 10], "name": [6, 8, 17, 18], "nation": 1, "natur": [1, 4, 6], "ndarrai": [6, 7, 9, 10], "necessari": [3, 12, 13], "need": [2, 3, 6, 10, 12, 13, 14, 15, 18], "neg": 9, "nest": 18, "network": [4, 6, 8, 17], "neural": [4, 6, 8, 17], "new": [2, 10], "next": [6, 16], "nois": 9, "noisi": [4, 6], "non": [4, 6, 7, 8, 9, 10], "none": [6, 7, 8, 9, 10, 18], "normal": [8, 9], "norwegian": 6, "note": [0, 2, 6, 8, 12, 14, 15, 17], "now": 2, "np": [8, 9, 10, 18], "num_output_channel": 9, "num_sampl": [6, 16], "number": [6, 9, 10, 18], "numpi": [7, 8, 10, 18], "o": 3, "obb": 15, "obj_detect": 14, "object": [6, 7, 10, 15, 18], "objectness_scor": [7, 18], "oblig": 1, "obtain": 18, "occupi": 17, "ocr": [4, 6, 8, 10, 14], "ocr_carea": 18, "ocr_db_crnn": 10, "ocr_lin": 18, "ocr_pag": 18, "ocr_par": 18, "ocr_predictor": [8, 12, 14, 17, 18], "ocrdataset": [6, 16], "ocrmetr": 10, "ocrpredictor": [8, 12], "ocrx_word": 18, "offens": 1, "offici": [1, 8], "offlin": 1, "offset": 9, "onc": 18, "one": [2, 6, 8, 9, 12, 14, 18], "oneof": 9, "ones": [6, 10], "onli": [2, 8, 9, 10, 12, 14, 16, 17, 18], "onlin": 1, "onnx": 15, "onnxruntim": [15, 17], "onnxtr": 17, "opac": 9, "opacity_rang": 9, "open": [1, 2, 14, 17], "opinion": 1, "optic": [4, 18], "optim": [4, 18], "option": [6, 8, 12], "order": [2, 6, 7, 9], "org": [1, 6, 8, 18], "organ": 7, "orient": [1, 7, 8, 11, 15, 18], "orientationpredictor": 8, "other": [1, 2], "otherwis": [1, 7, 10], "our": [2, 8, 18], "out": [2, 8, 9, 10, 18], "outpout": 18, "output": [7, 9, 17], "output_s": [7, 9], "outsid": 13, "over": [6, 10, 18], "overal": [1, 8], "overlai": 7, "overview": 15, "overwrit": 12, "overwritten": 14, "own": 4, "p": [9, 18], "packag": [2, 4, 10, 13, 15, 16, 17], "pad": [6, 8, 9, 18], "page": [3, 6, 8, 10, 12, 18], "page1": 7, "page2": 7, "page_1": 18, "page_idx": [7, 18], "page_orientation_predictor": [8, 12], "page_param": 12, "pair": 10, "paper": 8, "par_1_1": 18, "paragraph": 18, "paragraph_break": 18, "param": [9, 18], "paramet": [4, 7, 8, 17], "pars": [4, 6], "parseq": [4, 8, 14, 17, 18], "part": [6, 9, 18], "parti": 3, "partial": 18, "particip": 1, "pass": [6, 7, 8, 12, 18], "password": 7, "patch": [8, 10], "path": [6, 7, 15, 16, 17], "path_to_checkpoint": 12, "path_to_custom_model": 17, "path_to_pt": 12, "pattern": 1, "pdf": [7, 8, 11], "pdfpage": 7, "peopl": 1, "per": [9, 18], "perform": [4, 7, 8, 9, 10, 13, 17, 18], "period": 1, "permiss": 1, "permut": [4, 8], "persian_lett": 6, "person": [1, 16], "phase": 18, "photo": 16, "physic": [1, 7], "pick": 9, "pictur": 7, "pip": [2, 3, 15, 17], "pipelin": 18, "pixel": [7, 9, 18], "pleas": 2, "plot": 10, "plt": 10, "plug": 14, "plugin": 3, "png": 7, "point": 17, "polici": 13, "polish": 6, "polit": 1, "polygon": [6, 10, 18], "pool": 8, "portugues": 6, "posit": [1, 10], "possibl": [2, 10, 14, 18], "post": [1, 18], "postprocessor": 18, "potenti": 8, "power": 4, "ppageno": 18, "pre": [2, 8, 17], "precis": [10, 18], "pred": 10, "pred_box": 10, "pred_label": 10, "predefin": 16, "predict": [7, 8, 10, 18], "predictor": [4, 7, 8, 11, 12, 14, 17], "prefer": 16, "preinstal": 3, "preprocessor": [12, 18], "prerequisit": 14, "present": 11, "preserv": [8, 9, 18], "preserve_aspect_ratio": [7, 8, 9, 12, 18], "pretrain": [4, 8, 10, 12, 17, 18], "pretrained_backbon": [8, 12], "print": 18, "prior": 6, "privaci": 1, "privat": 1, "probabl": 9, "problem": 2, "procedur": 9, "process": [2, 4, 7, 12, 18], "processor": 18, "produc": [11, 18], "product": 17, "profession": 1, "project": [2, 16], "promptli": 1, "proper": 2, "properli": 6, "provid": [1, 2, 4, 14, 15, 16, 18], "public": [1, 4], "publicli": 18, "publish": 1, "pull": 14, "punctuat": 6, "pure": 6, "purpos": 2, "push_to_hf_hub": [8, 14], "py": 14, "pypdfium2": [3, 7], "pyplot": [7, 10], "python": [2, 15], "python3": 14, "pytorch": [3, 4, 8, 9, 12, 14, 17, 18], "q": 2, "qr": [7, 15], "qr_code": 16, "qualiti": 9, "question": 1, "quickli": 4, "quicktour": 11, "r": 18, "race": 1, "ramdisk": 6, "rand": [8, 9, 10, 17, 18], "random": [8, 9, 10, 18], "randomappli": 9, "randombright": 9, "randomcontrast": 9, "randomcrop": 9, "randomgamma": 9, "randomhorizontalflip": 9, "randomhu": 9, "randomjpegqu": 9, "randomli": 9, "randomres": 9, "randomrot": 9, "randomsatur": 9, "randomshadow": 9, "rang": 9, "rassi": 14, "ratio": [8, 9, 18], "raw": [7, 10], "re": 17, "read": [4, 6, 8], "read_html": 7, "read_img_as_numpi": 7, "read_img_as_tensor": 7, "read_pdf": 7, "readi": 17, "real": [4, 8, 9], "reason": [1, 4, 6], "rebuild": 2, "rebuilt": 2, "recal": [10, 18], "receipt": [4, 6, 18], "reco_arch": [8, 12, 14, 17], "reco_b": 18, "reco_model": [12, 14, 17], "reco_param": 12, "reco_predictor": 12, "recogn": 18, "recognit": [6, 10, 11, 12], "recognition_predictor": [8, 18], "recognition_task": [6, 16], "recognitiondataset": [6, 16], "recognitionpredictor": [8, 12], "rectangular": 8, "reduc": [3, 9], "refer": [2, 3, 12, 14, 15, 16, 18], "regardless": 1, "region": 18, "regroup": 10, "regular": 16, "reject": 1, "rel": [7, 9, 10, 18], "relat": 7, "releas": [0, 3], "relev": 15, "religion": 1, "remov": 1, "render": [7, 18], "repo": 8, "repo_id": [8, 14], "report": 1, "repositori": [6, 8, 14], "repres": [1, 17, 18], "represent": [4, 8], "request": [1, 14], "requir": [3, 9, 17], "research": 4, "residu": 8, "resiz": [9, 18], "resnet": 8, "resnet18": [8, 14], "resnet31": 8, "resnet34": 8, "resnet50": [8, 14], "resolv": 7, "resolve_block": 18, "resolve_lin": 18, "resourc": 16, "respect": 1, "rest": [2, 9, 10], "restrict": 13, "result": [2, 6, 7, 11, 14, 17, 18], "return": 18, "reusabl": 18, "review": 1, "rgb": [7, 9], "rgb_mode": 7, "rgb_output": 7, "right": [1, 8, 10], "robust": [4, 6], "root": 6, "rotat": [6, 7, 8, 9, 10, 11, 12, 16, 18], "run": [2, 3, 8], "same": [2, 7, 10, 16, 17, 18], "sampl": [6, 16, 18], "sample_transform": 6, "sar": [4, 8], "sar_resnet31": [8, 18], "satur": 9, "save": [8, 16], "scale": [7, 8, 9, 10], "scale_rang": 9, "scan": [4, 6], "scene": [4, 6, 8], "score": [7, 10], "script": [2, 16], "seamless": 4, "seamlessli": [4, 18], "search": 8, "searchabl": 11, "sec": 18, "second": 18, "section": [12, 14, 15, 17, 18], "secur": [1, 13], "see": [1, 2], "seen": 18, "segment": [4, 8, 18], "self": 18, "semant": [4, 8], "send": 18, "sens": 10, "sensit": 16, "separ": 18, "sequenc": [4, 6, 7, 8, 10, 18], "sequenti": [9, 18], "seri": 1, "seriou": 1, "set": [1, 3, 6, 8, 10, 13, 15, 18], "set_global_polici": 17, "sever": [7, 9, 18], "sex": 1, "sexual": 1, "shade": 9, "shape": [4, 7, 8, 9, 10, 18], "share": [13, 16], "shift": 9, "shm": 13, "should": [2, 6, 7, 9, 10], "show": [4, 7, 8, 10, 12, 14, 15], "showcas": [2, 11], "shuffl": [6, 9], "side": 10, "signatur": 7, "signific": 16, "simpl": [4, 8, 17], "simpler": 8, "sinc": [6, 16], "singl": [1, 2, 4, 6], "single_img_doc": 17, "size": [1, 6, 7, 9, 15, 18], "skew": 18, "slack": 2, "slightli": 8, "small": [2, 8, 18], "smallest": 7, "snapshot_download": 8, "snippet": 18, "so": [2, 3, 6, 8, 14, 16], "social": 1, "socio": 1, "some": [3, 11, 14, 16], "someth": 2, "somewher": 2, "sort": 1, "sourc": [6, 7, 8, 9, 10, 14], "space": [1, 18], "span": 18, "spanish": 6, "spatial": [4, 6, 7], "specif": [2, 3, 10, 12, 16, 18], "specifi": [1, 6, 7], "speed": [4, 8, 18], "sphinx": 2, "sroie": [4, 6, 16], "stabl": 3, "stackoverflow": 2, "stage": 4, "standalon": 11, "standard": 9, "start": 6, "state": [4, 10, 15], "static": 10, "statu": 1, "std": [9, 12], "step": 13, "still": 18, "str": [6, 7, 8, 9, 10], "straight": [6, 8, 16, 18], "straighten": 18, "straighten_pag": [8, 12, 18], "straigten_pag": 12, "stream": 7, "street": [4, 6], "strict": 3, "strictli": 10, "string": [6, 7, 10, 18], "strive": 3, "strong": [4, 8], "structur": [17, 18], "subset": [6, 18], "suggest": [2, 14], "sum": 10, "summari": 10, "support": [3, 12, 15, 17, 18], "sustain": 1, "svhn": [4, 6, 16], "svt": [6, 16], "swedish": 6, "symmetr": [8, 9, 18], "symmetric_pad": [8, 9, 18], "synthet": 4, "synthtext": [4, 6, 16], "system": 18, "t": [2, 6, 12, 17, 18], "tabl": [14, 15, 16], "take": [1, 6, 18], "target": [6, 7, 9, 10, 16], "target_s": 6, "task": [4, 6, 8, 14, 16, 18], "task2": 6, "team": 3, "techminde": 3, "templat": [2, 4], "tensor": [6, 7, 9, 18], "tensorflow": [3, 4, 7, 8, 9, 12, 14, 17, 18], "tensorspec": 17, "term": 1, "test": [6, 16], "test_set": 6, "text": [6, 7, 8, 10, 16], "text_output": 18, "textmatch": 10, "textnet": 8, "textnet_bas": 8, "textnet_smal": 8, "textnet_tini": 8, "textract": [4, 18], "textstylebrush": [4, 6], "textual": [4, 6, 7, 8, 18], "tf": [3, 7, 8, 9, 14, 17], "than": [2, 10, 14], "thank": 2, "thei": [1, 10], "them": [6, 18], "thi": [1, 2, 3, 5, 6, 9, 10, 12, 13, 14, 16, 17, 18], "thing": [17, 18], "third": 3, "those": [1, 7, 18], "threaten": 1, "threshold": 18, "through": [1, 9, 15, 16], "tilman": 14, "time": [1, 4, 8, 10, 16], "tini": 8, "titl": [7, 18], "tm": 18, "tmp": 13, "togeth": [2, 7], "tograi": 9, "tool": 16, "top": [10, 17, 18], "topic": 2, "torch": [3, 9, 12, 14, 17], "torchvis": 9, "total": 12, "toward": [1, 3], "train": [2, 6, 8, 9, 14, 15, 16, 17, 18], "train_it": [6, 16], "train_load": [6, 16], "train_pytorch": 14, "train_set": [6, 16], "train_tensorflow": 14, "trainabl": [4, 8], "tranform": 9, "transcrib": 18, "transfer": [4, 6], "transfo": 9, "transform": [4, 6, 8], "translat": 1, "troll": 1, "true": [6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18], "truth": 10, "tune": 17, "tupl": [6, 7, 9, 10], "two": [7, 13], "txt": 6, "type": [7, 10, 14, 17, 18], "typic": 18, "u": [1, 2], "ucsd": 6, "udac": 2, "uint8": [7, 8, 10, 18], "ukrainian": 6, "unaccept": 1, "underli": [16, 18], "underneath": 7, "understand": [4, 6, 18], "uniform": [8, 9], "uniformli": 9, "uninterrupt": [7, 18], "union": 10, "unittest": 2, "unlock": 7, "unoffici": 8, "unprofession": 1, "unsolicit": 1, "unsupervis": 4, "unwelcom": 1, "up": [8, 18], "updat": 10, "upgrad": 2, "upper": [6, 9], "uppercas": 16, "url": 7, "us": [1, 2, 3, 6, 8, 10, 11, 12, 13, 14, 15, 18], "usabl": 18, "usag": [13, 17], "use_polygon": [6, 10, 16], "useabl": 18, "user": [4, 7, 11], "utf": 18, "util": 17, "v1": 14, "v3": [8, 14, 18], "valid": 16, "valu": [2, 7, 9, 18], "valuabl": 4, "variabl": 13, "varieti": 6, "veri": 8, "version": [1, 2, 3, 17, 18], "vgg": 8, "vgg16": 14, "vgg16_bn_r": 8, "via": 1, "vietnames": 6, "view": [4, 6], "viewpoint": 1, "violat": 1, "visibl": 1, "vision": [4, 6, 8], "visiondataset": 6, "visiontransform": 8, "visual": [3, 4, 15], "visualize_pag": 10, "vit_": 8, "vit_b": 8, "vitstr": [4, 8, 17], "vitstr_bas": [8, 18], "vitstr_smal": [8, 12, 17, 18], "viz": 3, "vocab": [12, 14, 16, 17, 18], "vocabulari": [6, 12, 14], "w": [7, 8, 9, 10], "w3": 18, "wa": 1, "wai": [1, 4, 16], "want": [2, 17, 18], "warmup": 18, "wasn": 2, "we": [1, 2, 3, 4, 7, 9, 12, 14, 16, 17, 18], "weasyprint": 7, "web": [2, 7], "websit": 6, "welcom": 1, "well": [1, 17], "were": [1, 7, 18], "what": 1, "when": [1, 2, 8], "whenev": 2, "where": [2, 7, 9, 10], "whether": [2, 6, 7, 9, 10, 16, 18], "which": [1, 8, 13, 15, 16, 18], "whichev": 3, "while": [9, 18], "why": 1, "width": [7, 9], "wiki": 1, "wildreceipt": [4, 6, 16], "window": [8, 10], "wish": 2, "within": 1, "without": [1, 6, 8], "wonder": 2, "word": [4, 6, 8, 10, 18], "word_1_1": 18, "word_1_2": 18, "word_1_3": 18, "wordgener": [6, 16], "words_onli": 10, "work": [12, 13, 18], "workflow": 2, "worklow": 2, "world": [10, 18], "worth": 8, "wrap": 18, "wrapper": [6, 9], "write": 13, "written": [1, 7], "www": [1, 7, 18], "x": [7, 9, 10], "x_ascend": 18, "x_descend": 18, "x_i": 10, "x_size": 18, "x_wconf": 18, "xhtml": 18, "xmax": 7, "xmin": 7, "xml": 18, "xml_bytes_str": 18, "xml_element": 18, "xml_output": 18, "xmln": 18, "y": 10, "y_i": 10, "y_j": 10, "yet": 15, "ymax": 7, "ymin": 7, "yolov8": 15, "you": [2, 3, 6, 7, 8, 12, 13, 14, 15, 16, 17, 18], "your": [2, 4, 7, 10, 18], "yoursit": 7, "zero": [9, 10], "zoo": 12, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 6, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 6, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 6, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 6, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 6, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 6, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 6, "\u00e4\u00f6\u00e4\u00f6": 6, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 6, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 6, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 6, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 6, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 6, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 6, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0905": 6, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 6, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 6, "\u0950": 6, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 6, "\u09bd": 6, "\u09ce": 6, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 6}, "titles": ["Changelog", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 2, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 1], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 1], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 1], "31": 0, "4": [0, 1], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 18, "approach": 18, "architectur": 18, "arg": [6, 7, 8, 9, 10], "artefact": 7, "artefactdetect": 15, "attribut": 1, "avail": [15, 16, 18], "aw": 13, "ban": 1, "block": 7, "bug": 2, "changelog": 0, "choos": [16, 18], "classif": [8, 12, 14], "code": [1, 2], "codebas": 2, "commit": 2, "commun": 14, "compos": 9, "conda": 3, "conduct": 1, "connect": 2, "continu": 2, "contrib": 5, "contribut": [2, 5, 15], "contributor": 1, "convent": 14, "correct": 1, "coven": 1, "custom": [6, 12], "data": 16, "dataload": 6, "dataset": [4, 6, 16], "detect": [4, 8, 14, 16, 18], "develop": 2, "do": 18, "doctr": [2, 4, 5, 6, 7, 8, 9, 10, 11], "document": [2, 4, 7], "end": 18, "enforc": 1, "evalu": 10, "export": 17, "factori": 8, "featur": [2, 4], "feedback": 2, "file": 7, "from": 14, "gener": [6, 16], "git": 3, "guidelin": 1, "half": 17, "hub": 14, "huggingfac": 14, "i": 18, "infer": 17, "instal": [2, 3], "integr": [2, 15], "io": 7, "lambda": 13, "let": 2, "line": 7, "linux": 3, "load": [12, 14, 16], "loader": 6, "main": 4, "mode": 2, "model": [4, 8, 12, 14, 17, 18], "modifi": 2, "modul": [5, 15], "name": 14, "notebook": 11, "object": 16, "ocr": [16, 18], "onli": 3, "onnx": 17, "optim": 17, "option": 18, "orient": 12, "our": 1, "output": 18, "own": [12, 16], "packag": 3, "page": 7, "perman": 1, "pipelin": 15, "pledg": 1, "precis": 17, "predictor": 18, "prepar": 17, "prerequisit": 3, "pretrain": 14, "push": 14, "python": 3, "qualiti": 2, "question": 2, "read": 7, "readi": 16, "recognit": [4, 8, 14, 16, 18], "report": 2, "request": 2, "respons": 1, "return": [6, 7, 8, 10], "right": 18, "scope": 1, "share": 14, "should": 18, "stage": 18, "standard": 1, "structur": [2, 7], "style": 2, "support": [4, 5, 6, 9], "synthet": [6, 16], "task": 10, "temporari": 1, "test": 2, "text": [4, 18], "train": 12, "transform": 9, "two": 18, "unit": 2, "us": [16, 17], "util": 10, "v0": 0, "verif": 2, "via": 3, "visual": 10, "vocab": 6, "warn": 1, "what": 18, "word": 7, "your": [12, 14, 15, 16, 17], "zoo": [4, 8]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[1, "correction"]], "2. Warning": [[1, "warning"]], "3. Temporary Ban": [[1, "temporary-ban"]], "4. Permanent Ban": [[1, "permanent-ban"]], "AWS Lambda": [[13, null]], "Advanced options": [[18, "advanced-options"]], "Args:": [[6, "args"], [6, "id4"], [6, "id7"], [6, "id10"], [6, "id13"], [6, "id16"], [6, "id19"], [6, "id22"], [6, "id25"], [6, "id29"], [6, "id32"], [6, "id37"], [6, "id40"], [6, "id46"], [6, "id49"], [6, "id50"], [6, "id51"], [6, "id54"], [6, "id57"], [6, "id60"], [6, "id61"], [7, "args"], [7, "id2"], [7, "id3"], [7, "id4"], [7, "id5"], [7, "id6"], [7, "id7"], [7, "id10"], [7, "id12"], [7, "id14"], [7, "id16"], [7, "id20"], [7, "id24"], [7, "id28"], [8, "args"], [8, "id3"], [8, "id8"], [8, "id13"], [8, "id17"], [8, "id21"], [8, "id26"], [8, "id31"], [8, "id36"], [8, "id41"], [8, "id46"], [8, "id50"], [8, "id54"], [8, "id59"], [8, "id63"], [8, "id68"], [8, "id73"], [8, "id77"], [8, "id81"], [8, "id85"], [8, "id90"], [8, "id95"], [8, "id99"], [8, "id104"], [8, "id109"], [8, "id114"], [8, "id119"], [8, "id123"], [8, "id127"], [8, "id132"], [8, "id137"], [8, "id142"], [8, "id146"], [8, "id150"], [8, "id155"], [8, "id159"], [8, "id163"], [8, "id167"], [8, "id169"], [8, "id171"], [8, "id173"], [9, "args"], [9, "id1"], [9, "id2"], [9, "id3"], [9, "id4"], [9, "id5"], [9, "id6"], [9, "id7"], [9, "id8"], [9, "id9"], [9, "id10"], [9, "id11"], [9, "id12"], [9, "id13"], [9, "id14"], [9, "id15"], [9, "id16"], [9, "id17"], [9, "id18"], [9, "id19"], [10, "args"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"]], "Artefact": [[7, "artefact"]], "ArtefactDetection": [[15, "artefactdetection"]], "Attribution": [[1, "attribution"]], "Available Datasets": [[16, "available-datasets"]], "Available architectures": [[18, "available-architectures"], [18, "id1"], [18, "id2"]], "Available contribution modules": [[15, "available-contribution-modules"]], "Block": [[7, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[16, null]], "Choosing the right model": [[18, null]], "Classification": [[14, "classification"]], "Code quality": [[2, "code-quality"]], "Code style verification": [[2, "code-style-verification"]], "Codebase structure": [[2, "codebase-structure"]], "Commits": [[2, "commits"]], "Composing transformations": [[9, "composing-transformations"]], "Continuous Integration": [[2, "continuous-integration"]], "Contributing to docTR": [[2, null]], "Contributor Covenant Code of Conduct": [[1, null]], "Custom dataset loader": [[6, "custom-dataset-loader"]], "Custom orientation classification models": [[12, "custom-orientation-classification-models"]], "Data Loading": [[16, "data-loading"]], "Dataloader": [[6, "dataloader"]], "Detection": [[14, "detection"], [16, "detection"]], "Detection predictors": [[18, "detection-predictors"]], "Developer mode installation": [[2, "developer-mode-installation"]], "Developing docTR": [[2, "developing-doctr"]], "Document": [[7, "document"]], "Document structure": [[7, "document-structure"]], "End-to-End OCR": [[18, "end-to-end-ocr"]], "Enforcement": [[1, "enforcement"]], "Enforcement Guidelines": [[1, "enforcement-guidelines"]], "Enforcement Responsibilities": [[1, "enforcement-responsibilities"]], "Export to ONNX": [[17, "export-to-onnx"]], "Feature requests & bug report": [[2, "feature-requests-bug-report"]], "Feedback": [[2, "feedback"]], "File reading": [[7, "file-reading"]], "Half-precision": [[17, "half-precision"]], "Installation": [[3, null]], "Integrate contributions into your pipeline": [[15, null]], "Let\u2019s connect": [[2, "let-s-connect"]], "Line": [[7, "line"]], "Loading from Huggingface Hub": [[14, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[12, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[12, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[4, "main-features"]], "Model optimization": [[17, "model-optimization"]], "Model zoo": [[4, "model-zoo"]], "Modifying the documentation": [[2, "modifying-the-documentation"]], "Naming conventions": [[14, "naming-conventions"]], "OCR": [[16, "ocr"]], "Object Detection": [[16, "object-detection"]], "Our Pledge": [[1, "our-pledge"]], "Our Standards": [[1, "our-standards"]], "Page": [[7, "page"]], "Preparing your model for inference": [[17, null]], "Prerequisites": [[3, "prerequisites"]], "Pretrained community models": [[14, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[14, "pushing-to-the-huggingface-hub"]], "Questions": [[2, "questions"]], "Recognition": [[14, "recognition"], [16, "recognition"]], "Recognition predictors": [[18, "recognition-predictors"]], "Returns:": [[6, "returns"], [7, "returns"], [7, "id11"], [7, "id13"], [7, "id15"], [7, "id19"], [7, "id23"], [7, "id27"], [7, "id31"], [8, "returns"], [8, "id6"], [8, "id11"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id29"], [8, "id34"], [8, "id39"], [8, "id44"], [8, "id49"], [8, "id53"], [8, "id57"], [8, "id62"], [8, "id66"], [8, "id71"], [8, "id76"], [8, "id80"], [8, "id84"], [8, "id88"], [8, "id93"], [8, "id98"], [8, "id102"], [8, "id107"], [8, "id112"], [8, "id117"], [8, "id122"], [8, "id126"], [8, "id130"], [8, "id135"], [8, "id140"], [8, "id145"], [8, "id149"], [8, "id153"], [8, "id158"], [8, "id162"], [8, "id166"], [8, "id168"], [8, "id170"], [8, "id172"], [10, "returns"]], "Scope": [[1, "scope"]], "Share your model with the community": [[14, null]], "Supported Vocabs": [[6, "supported-vocabs"]], "Supported contribution modules": [[5, "supported-contribution-modules"]], "Supported datasets": [[4, "supported-datasets"]], "Supported transformations": [[9, "supported-transformations"]], "Synthetic dataset generator": [[6, "synthetic-dataset-generator"], [16, "synthetic-dataset-generator"]], "Task evaluation": [[10, "task-evaluation"]], "Text Detection": [[18, "text-detection"]], "Text Recognition": [[18, "text-recognition"]], "Text detection models": [[4, "text-detection-models"]], "Text recognition models": [[4, "text-recognition-models"]], "Train your own model": [[12, null]], "Two-stage approaches": [[18, "two-stage-approaches"]], "Unit tests": [[2, "unit-tests"]], "Use your own datasets": [[16, "use-your-own-datasets"]], "Using your ONNX exported model": [[17, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[3, "via-conda-only-for-linux"]], "Via Git": [[3, "via-git"]], "Via Python Package": [[3, "via-python-package"]], "Visualization": [[10, "visualization"]], "What should I do with the output?": [[18, "what-should-i-do-with-the-output"]], "Word": [[7, "word"]], "docTR Notebooks": [[11, null]], "docTR Vocabs": [[6, "id62"]], "docTR: Document Text Recognition": [[4, null]], "doctr.contrib": [[5, null]], "doctr.datasets": [[6, null], [6, "datasets"]], "doctr.io": [[7, null]], "doctr.models": [[8, null]], "doctr.models.classification": [[8, "doctr-models-classification"]], "doctr.models.detection": [[8, "doctr-models-detection"]], "doctr.models.factory": [[8, "doctr-models-factory"]], "doctr.models.recognition": [[8, "doctr-models-recognition"]], "doctr.models.zoo": [[8, "doctr-models-zoo"]], "doctr.transforms": [[9, null]], "doctr.utils": [[10, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[7, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[7, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[9, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[6, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[9, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[9, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[6, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[6, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[8, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[6, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[6, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[7, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[7, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[6, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[6, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[9, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[9, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[6, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[6, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[6, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[6, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[6, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[8, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[9, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[7, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[6, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[9, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[8, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[6, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[9, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[7, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[9, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[9, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[9, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[9, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[9, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[9, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[9, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[9, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[9, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[9, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[9, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[9, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[7, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[7, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[7, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[6, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[9, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[7, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[7, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[6, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[6, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[6, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[6, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[9, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[10, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[6, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[7, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[6, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[6, 0, 1, "", "CORD"], [6, 0, 1, "", "CharacterGenerator"], [6, 0, 1, "", "DetectionDataset"], [6, 0, 1, "", "DocArtefacts"], [6, 0, 1, "", "FUNSD"], [6, 0, 1, "", "IC03"], [6, 0, 1, "", "IC13"], [6, 0, 1, "", "IIIT5K"], [6, 0, 1, "", "IIITHWS"], [6, 0, 1, "", "IMGUR5K"], [6, 0, 1, "", "MJSynth"], [6, 0, 1, "", "OCRDataset"], [6, 0, 1, "", "RecognitionDataset"], [6, 0, 1, "", "SROIE"], [6, 0, 1, "", "SVHN"], [6, 0, 1, "", "SVT"], [6, 0, 1, "", "SynthText"], [6, 0, 1, "", "WILDRECEIPT"], [6, 0, 1, "", "WordGenerator"], [6, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[6, 0, 1, "", "DataLoader"]], "doctr.io": [[7, 0, 1, "", "Artefact"], [7, 0, 1, "", "Block"], [7, 0, 1, "", "Document"], [7, 0, 1, "", "DocumentFile"], [7, 0, 1, "", "Line"], [7, 0, 1, "", "Page"], [7, 0, 1, "", "Word"], [7, 1, 1, "", "decode_img_as_tensor"], [7, 1, 1, "", "read_html"], [7, 1, 1, "", "read_img_as_numpy"], [7, 1, 1, "", "read_img_as_tensor"], [7, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[7, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[7, 2, 1, "", "from_images"], [7, 2, 1, "", "from_pdf"], [7, 2, 1, "", "from_url"]], "doctr.io.Page": [[7, 2, 1, "", "show"]], "doctr.models": [[8, 1, 1, "", "kie_predictor"], [8, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[8, 1, 1, "", "crop_orientation_predictor"], [8, 1, 1, "", "magc_resnet31"], [8, 1, 1, "", "mobilenet_v3_large"], [8, 1, 1, "", "mobilenet_v3_large_r"], [8, 1, 1, "", "mobilenet_v3_small"], [8, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [8, 1, 1, "", "mobilenet_v3_small_page_orientation"], [8, 1, 1, "", "mobilenet_v3_small_r"], [8, 1, 1, "", "page_orientation_predictor"], [8, 1, 1, "", "resnet18"], [8, 1, 1, "", "resnet31"], [8, 1, 1, "", "resnet34"], [8, 1, 1, "", "resnet50"], [8, 1, 1, "", "textnet_base"], [8, 1, 1, "", "textnet_small"], [8, 1, 1, "", "textnet_tiny"], [8, 1, 1, "", "vgg16_bn_r"], [8, 1, 1, "", "vit_b"], [8, 1, 1, "", "vit_s"]], "doctr.models.detection": [[8, 1, 1, "", "db_mobilenet_v3_large"], [8, 1, 1, "", "db_resnet50"], [8, 1, 1, "", "detection_predictor"], [8, 1, 1, "", "fast_base"], [8, 1, 1, "", "fast_small"], [8, 1, 1, "", "fast_tiny"], [8, 1, 1, "", "linknet_resnet18"], [8, 1, 1, "", "linknet_resnet34"], [8, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[8, 1, 1, "", "from_hub"], [8, 1, 1, "", "login_to_hub"], [8, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[8, 1, 1, "", "crnn_mobilenet_v3_large"], [8, 1, 1, "", "crnn_mobilenet_v3_small"], [8, 1, 1, "", "crnn_vgg16_bn"], [8, 1, 1, "", "master"], [8, 1, 1, "", "parseq"], [8, 1, 1, "", "recognition_predictor"], [8, 1, 1, "", "sar_resnet31"], [8, 1, 1, "", "vitstr_base"], [8, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[9, 0, 1, "", "ChannelShuffle"], [9, 0, 1, "", "ColorInversion"], [9, 0, 1, "", "Compose"], [9, 0, 1, "", "GaussianBlur"], [9, 0, 1, "", "GaussianNoise"], [9, 0, 1, "", "LambdaTransformation"], [9, 0, 1, "", "Normalize"], [9, 0, 1, "", "OneOf"], [9, 0, 1, "", "RandomApply"], [9, 0, 1, "", "RandomBrightness"], [9, 0, 1, "", "RandomContrast"], [9, 0, 1, "", "RandomCrop"], [9, 0, 1, "", "RandomGamma"], [9, 0, 1, "", "RandomHorizontalFlip"], [9, 0, 1, "", "RandomHue"], [9, 0, 1, "", "RandomJpegQuality"], [9, 0, 1, "", "RandomResize"], [9, 0, 1, "", "RandomRotate"], [9, 0, 1, "", "RandomSaturation"], [9, 0, 1, "", "RandomShadow"], [9, 0, 1, "", "Resize"], [9, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[10, 0, 1, "", "DetectionMetric"], [10, 0, 1, "", "LocalizationConfusion"], [10, 0, 1, "", "OCRMetric"], [10, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.visualization": [[10, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [1, 7, 8, 10, 14, 17], "0": [1, 3, 6, 9, 10, 12, 15, 16, 18], "00": 18, "01": 18, "0123456789": 6, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "02562": 8, "03": 18, "035": 18, "0361328125": 18, "04": 18, "05": 18, "06": 18, "06640625": 18, "07": 18, "08": [9, 18], "09": 18, "0966796875": 18, "1": [6, 7, 8, 9, 10, 12, 16, 18], "10": [6, 10, 18], "100": [6, 9, 10, 16, 18], "1000": 18, "101": 6, "1024": [8, 12, 18], "104": 6, "106": 6, "108": 6, "1095": 16, "11": 18, "110": 10, "1107": 16, "114": 6, "115": 6, "1156": 16, "116": 6, "118": 6, "11800h": 18, "11th": 18, "12": 18, "120": 6, "123": 6, "126": 6, "1268": 16, "128": [8, 12, 17, 18], "13": 18, "130": 6, "13068": 16, "131": 6, "1337891": 16, "1357421875": 18, "1396484375": 18, "14": 18, "1420": 18, "14470v1": 6, "149": 16, "15": 18, "150": [10, 18], "1552": 18, "16": [8, 17, 18], "1630859375": 18, "1684": 18, "16x16": 8, "17": 18, "1778": 18, "1782": 18, "18": [8, 18], "185546875": 18, "1900": 18, "1910": 8, "19342": 16, "19370": 16, "195": 6, "19598": 16, "199": 18, "1999": 18, "2": [3, 4, 6, 7, 9, 15, 18], "20": 18, "200": 10, "2000": 16, "2003": [4, 6], "2012": 6, "2013": [4, 6], "2015": 6, "2019": 4, "207901": 16, "21": 18, "2103": 6, "2186": 16, "21888": 16, "22": 18, "224": [8, 9], "225": 9, "22672": 16, "229": [9, 16], "23": 18, "233": 16, "236": 6, "24": 18, "246": 16, "249": 16, "25": 18, "2504": 18, "255": [7, 8, 9, 10, 18], "256": 8, "257": 16, "26": 18, "26032": 16, "264": 12, "27": 18, "2700": 16, "2710": 18, "2749": 12, "28": 18, "287": 12, "29": 18, "296": 12, "299": 12, "2d": 18, "3": [3, 4, 7, 8, 9, 10, 17, 18], "30": 18, "300": 16, "3000": 16, "301": 12, "30595": 18, "30ghz": 18, "31": 8, "32": [6, 8, 9, 12, 16, 17, 18], "3232421875": 18, "33": [9, 18], "33402": 16, "33608": 16, "34": [8, 18], "340": 18, "3456": 18, "3515625": 18, "36": 18, "360": 16, "37": [6, 18], "38": 18, "39": 18, "4": [8, 9, 10, 18], "40": 18, "406": 9, "41": 18, "42": 18, "43": 18, "44": 18, "45": 18, "456": 9, "46": 18, "47": 18, "472": 16, "48": [6, 18], "485": 9, "49": 18, "49377": 16, "5": [6, 9, 10, 15, 18], "50": [8, 16, 18], "51": 18, "51171875": 18, "512": 8, "52": [6, 18], "529": 18, "53": 18, "54": 18, "540": 18, "5478515625": 18, "55": 18, "56": 18, "57": 18, "58": [6, 18], "580": 18, "5810546875": 18, "583": 18, "59": 18, "597": 18, "5k": [4, 6], "5m": 18, "6": [9, 18], "60": 9, "600": [8, 10, 18], "61": 18, "62": 18, "626": 16, "63": 18, "64": [8, 9, 18], "641": 18, "647": 16, "65": 18, "66": 18, "67": 18, "68": 18, "69": 18, "693": 12, "694": 12, "695": 12, "6m": 18, "7": 18, "70": [6, 10, 18], "707470": 16, "71": [6, 18], "7100000": 16, "7141797": 16, "7149": 16, "72": 18, "72dpi": 7, "73": 18, "73257": 16, "74": 18, "75": [9, 18], "7581382": 16, "76": 18, "77": 18, "772": 12, "772875": 16, "78": 18, "785": 12, "79": 18, "793533": 16, "796": 16, "798": 12, "7m": 18, "8": [8, 9, 18], "80": 18, "800": [8, 10, 16, 18], "81": 18, "82": 18, "83": 18, "84": 18, "849": 16, "85": 18, "8564453125": 18, "857": 18, "85875": 16, "86": 18, "8603515625": 18, "87": 18, "8707": 16, "88": 18, "89": 18, "9": [3, 9, 18], "90": 18, "90k": 6, "90kdict32px": 6, "91": 18, "914085328578949": 18, "92": 18, "93": 18, "94": [6, 18], "95": [10, 18], "9578408598899841": 18, "96": 18, "97": 18, "98": 18, "99": 18, "9949972033500671": 18, "A": [1, 2, 4, 6, 7, 8, 11, 17], "As": 2, "Be": 18, "Being": 1, "By": 13, "For": [1, 2, 3, 12, 18], "If": [2, 7, 8, 12, 18], "In": [2, 6, 16], "It": [9, 14, 15, 17], "Its": [4, 8], "No": [1, 18], "Of": 6, "Or": [15, 17], "The": [1, 2, 6, 7, 10, 13, 15, 16, 17, 18], "Then": 8, "To": [2, 3, 13, 14, 15, 17, 18], "_": [1, 6, 8], "__call__": 18, "_build": 2, "_i": 10, "ab": 6, "abc": 17, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "abdef": [6, 16], "abl": [16, 18], "about": [1, 16, 18], "abov": 18, "abstractdataset": 6, "abus": 1, "accept": 1, "access": [4, 7, 16, 18], "account": [1, 14], "accur": 18, "accuraci": 10, "achiev": 17, "act": 1, "action": 1, "activ": 4, "ad": [2, 8, 9], "adapt": 1, "add": [9, 10, 14, 18], "add_hook": 18, "add_label": 10, "addit": [2, 3, 7, 15, 18], "addition": [2, 18], "address": [1, 7], "adjust": 9, "advanc": 1, "advantag": 17, "advis": 2, "aesthet": [4, 6], "affect": 1, "after": [14, 18], "ag": 1, "again": 8, "aggreg": [10, 16], "aggress": 1, "align": [1, 7, 9], "all": [1, 2, 5, 6, 7, 9, 10, 15, 16, 18], "allow": [1, 17], "along": 18, "alreadi": [2, 17], "also": [1, 8, 14, 15, 16, 18], "alwai": 16, "an": [1, 2, 4, 6, 7, 8, 10, 15, 17, 18], "analysi": [7, 15], "ancient_greek": 6, "angl": [7, 9], "ani": [1, 6, 7, 8, 9, 10, 17, 18], "annot": 6, "anot": 16, "anoth": [8, 12, 16], "answer": 1, "anyascii": 10, "anyon": 4, "anyth": 15, "api": [2, 4], "apolog": 1, "apologi": 1, "app": 2, "appear": 1, "appli": [1, 6, 9], "applic": [4, 8], "appoint": 1, "appreci": 14, "appropri": [1, 2, 18], "ar": [1, 2, 3, 5, 6, 7, 9, 10, 11, 15, 16, 18], "arab": 6, "arabic_diacrit": 6, "arabic_lett": 6, "arabic_punctu": 6, "arbitrarili": [4, 8], "arch": [8, 14], "architectur": [4, 8, 14, 15], "area": 18, "argument": [6, 7, 8, 10, 12, 18], "around": 1, "arrai": [7, 9, 10], "art": [4, 15], "artefact": [10, 15, 18], "artefact_typ": 7, "artifici": [4, 6], "arxiv": [6, 8], "asarrai": 10, "ascii_lett": 6, "aspect": [4, 8, 9, 18], "assess": 10, "assign": 10, "associ": 7, "assum": 8, "assume_straight_pag": [8, 12, 18], "astyp": [8, 10, 18], "attack": 1, "attend": [4, 8], "attent": [1, 8], "autom": 4, "automat": 18, "autoregress": [4, 8], "avail": [1, 4, 5, 9], "averag": [9, 18], "avoid": [1, 3], "aw": [4, 18], "awar": 18, "azur": 18, "b": [8, 10, 18], "b_j": 10, "back": 2, "backbon": 8, "backend": 18, "background": 16, "bangla": 6, "bar": 15, "bar_cod": 16, "base": [4, 8, 15], "baselin": [4, 8, 18], "batch": [6, 8, 9, 15, 16, 18], "batch_siz": [6, 12, 15, 16, 17], "bblanchon": 3, "bbox": 18, "becaus": 13, "been": [2, 10, 16, 18], "befor": [6, 8, 9, 18], "begin": 10, "behavior": [1, 18], "being": [10, 18], "belong": 18, "benchmark": 18, "best": 1, "better": [11, 18], "between": [9, 10, 18], "bgr": 7, "bilinear": 9, "bin_thresh": 18, "binar": [4, 8, 18], "binari": [7, 17, 18], "bit": 17, "block": [10, 18], "block_1_1": 18, "blur": 9, "bmvc": 6, "bn": 14, "bodi": [1, 18], "bool": [6, 7, 8, 9, 10], "boolean": [8, 18], "both": [4, 6, 9, 16, 18], "bottom": [8, 18], "bound": [6, 7, 8, 9, 10, 15, 16, 18], "box": [6, 7, 8, 9, 10, 15, 16, 18], "box_thresh": 18, "bright": 9, "browser": [2, 4], "build": [2, 3, 17], "built": 2, "byte": [7, 18], "c": [3, 7, 10], "c_j": 10, "cach": [2, 6, 13], "cache_sampl": 6, "call": 17, "callabl": [6, 9], "can": [2, 3, 12, 13, 14, 15, 16, 18], "capabl": [2, 11, 18], "case": [6, 10], "cf": 18, "cfg": 18, "challeng": 6, "challenge2_test_task12_imag": 6, "challenge2_test_task1_gt": 6, "challenge2_training_task12_imag": 6, "challenge2_training_task1_gt": 6, "chang": [13, 18], "channel": [1, 2, 7, 9], "channel_prior": 3, "channelshuffl": 9, "charact": [4, 6, 7, 10, 16, 18], "charactergener": [6, 16], "characterist": 1, "charg": 18, "charset": 18, "chart": 7, "check": [2, 14, 18], "checkpoint": 8, "chip": 3, "ci": 2, "clarifi": 1, "clariti": 1, "class": [1, 6, 7, 9, 10, 18], "class_nam": 12, "classif": [16, 18], "classmethod": 7, "clear": 2, "clone": 3, "close": 2, "co": 14, "code": [4, 7, 15], "codecov": 2, "colab": 11, "collate_fn": 6, "collect": [7, 15], "color": 9, "colorinvers": 9, "column": 7, "com": [1, 3, 7, 8, 14], "combin": 18, "command": [2, 15], "comment": 1, "commit": 1, "common": [1, 9, 10, 17], "commun": 1, "compar": 4, "comparison": [10, 18], "competit": 6, "compil": [11, 18], "complaint": 1, "complementari": 10, "complet": 2, "compon": 18, "compos": [6, 18], "comprehens": 18, "comput": [6, 10, 17, 18], "conf_threshold": 15, "confid": [7, 18], "config": [3, 8], "configur": 8, "confus": 10, "consecut": [9, 18], "consequ": 1, "consid": [1, 2, 6, 7, 10, 18], "consist": 18, "consolid": [4, 6], "constant": 9, "construct": 1, "contact": 1, "contain": [5, 6, 11, 16, 18], "content": [6, 7, 18], "context": 8, "contib": 3, "continu": 1, "contrast": 9, "contrast_factor": 9, "contrib": [3, 15], "contribut": 1, "contributor": 2, "convers": 7, "convert": [7, 9], "convolut": 8, "coordin": [7, 18], "cord": [4, 6, 16, 18], "core": [10, 18], "corner": 18, "correct": 9, "correspond": [3, 7, 9, 18], "could": [1, 15], "counterpart": 10, "cover": 2, "coverag": 2, "cpu": [4, 12, 17], "creat": 14, "crnn": [4, 8, 14], "crnn_mobilenet_v3_larg": [8, 14, 18], "crnn_mobilenet_v3_smal": [8, 17, 18], "crnn_vgg16_bn": [8, 12, 14, 18], "crop": [7, 8, 9, 12, 16, 18], "crop_orient": [7, 18], "crop_orientation_predictor": [8, 12], "crop_param": 12, "cuda": 17, "currenc": 6, "current": [2, 12, 18], "custom": [14, 15, 17, 18], "custom_crop_orientation_model": 12, "custom_page_orientation_model": 12, "customhook": 18, "cvit": 4, "czczup": 8, "czech": 6, "d": [6, 16], "danish": 6, "data": [4, 6, 7, 9, 10, 12, 14], "dataload": 16, "dataset": [8, 12, 18], "dataset_info": 6, "date": [12, 18], "db": 14, "db_mobilenet_v3_larg": [8, 14, 18], "db_resnet34": 18, "db_resnet50": [8, 12, 14, 18], "dbnet": [4, 8], "deal": [11, 18], "decis": 1, "decod": 7, "decode_img_as_tensor": 7, "dedic": 17, "deem": 1, "deep": [8, 18], "def": 18, "default": [3, 7, 12, 13, 18], "defer": 16, "defin": [10, 17], "degre": [7, 9, 18], "degress": 7, "delet": 2, "delimit": 18, "delta": 9, "demo": [2, 4], "demonstr": 1, "depend": [2, 3, 4, 18], "deploi": 2, "deploy": 4, "derogatori": 1, "describ": 8, "descript": 11, "design": 9, "desir": 7, "det_arch": [8, 12, 14, 17], "det_b": 18, "det_model": [12, 14, 17], "det_param": 12, "det_predictor": [12, 18], "detail": [12, 18], "detect": [6, 7, 10, 11, 12, 15], "detect_languag": 8, "detect_orient": [8, 12, 18], "detection_predictor": [8, 18], "detection_task": [6, 16], "detectiondataset": [6, 16], "detectionmetr": 10, "detectionpredictor": [8, 12], "detector": [4, 8, 15], "deterior": 8, "determin": 1, "dev": [2, 13], "develop": 3, "deviat": 9, "devic": 17, "dict": [7, 10, 18], "dictionari": [7, 10], "differ": 1, "differenti": [4, 8], "digit": [4, 6, 16], "dimens": [7, 10, 18], "dimension": 9, "direct": 6, "directli": [14, 18], "directori": [2, 13], "disabl": [1, 13, 18], "disable_crop_orient": 18, "disable_page_orient": 18, "disclaim": 18, "discuss": 2, "disparag": 1, "displai": [7, 10], "display_artefact": 10, "distribut": 9, "div": 18, "divers": 1, "divid": 7, "do": [2, 3, 8], "doc": [2, 7, 15, 17, 18], "docartefact": [6, 16], "docstr": 2, "doctr": [3, 12, 13, 14, 15, 16, 17, 18], "doctr_cache_dir": 13, "doctr_multiprocessing_dis": 13, "document": [6, 8, 10, 11, 12, 15, 16, 17, 18], "documentbuild": 18, "documentfil": [7, 12, 14, 15, 17], "doesn": 17, "don": [12, 18], "done": 9, "download": [6, 16], "downsiz": 8, "draw": 9, "drop": 6, "drop_last": 6, "dtype": [7, 8, 9, 10, 17], "dual": [4, 6], "dummi": 14, "dummy_img": 18, "dummy_input": 17, "dure": 1, "dutch": 6, "dynam": [6, 15], "dynamic_seq_length": 6, "e": [1, 2, 3, 7, 8], "each": [4, 6, 7, 8, 9, 10, 16, 18], "eas": 2, "easi": [4, 10, 14, 17], "easili": [7, 10, 12, 14, 16, 18], "econom": 1, "edit": 1, "educ": 1, "effect": 18, "effici": [2, 4, 6, 8], "either": [10, 18], "element": [6, 7, 8, 18], "els": [2, 15], "email": 1, "empathi": 1, "en": 18, "enabl": [6, 7], "enclos": 7, "encod": [4, 6, 7, 8, 18], "encode_sequ": 6, "encount": 2, "encrypt": 7, "end": [4, 6, 8, 10], "english": [6, 16], "enough": [2, 18], "ensur": 2, "entri": 6, "environ": [1, 13], "eo": 6, "equiv": 18, "estim": 8, "etc": [7, 15], "ethnic": 1, "evalu": [16, 18], "event": 1, "everyon": 1, "everyth": [2, 18], "exact": [10, 18], "exampl": [1, 2, 4, 6, 8, 14, 18], "exchang": 17, "execut": 18, "exist": 14, "expand": 9, "expect": [7, 9, 10], "experi": 1, "explan": [1, 18], "explicit": 1, "exploit": [4, 8], "export": [7, 8, 10, 11, 15, 18], "export_as_straight_box": [8, 18], "export_as_xml": 18, "export_model_to_onnx": 17, "express": [1, 9], "extens": 7, "extern": [1, 16], "extract": [4, 6], "extractor": 8, "f_": 10, "f_a": 10, "factor": 9, "fair": 1, "fairli": 1, "fals": [6, 7, 8, 9, 10, 12, 18], "faq": 1, "fascan": 14, "fast": [4, 6, 8], "fast_bas": [8, 18], "fast_smal": [8, 18], "fast_tini": [8, 18], "faster": [4, 8, 17], "fasterrcnn_mobilenet_v3_large_fpn": 8, "favorit": 18, "featur": [3, 8, 10, 11, 12, 15], "feedback": 1, "feel": [2, 14], "felix92": 14, "few": [17, 18], "figsiz": 10, "figur": [10, 15], "file": [2, 6], "final": 8, "find": [2, 16], "finnish": 6, "first": [2, 6], "firsthand": 6, "fit": [8, 18], "flag": 18, "flip": 9, "float": [7, 9, 10, 17], "float32": [7, 8, 9, 17], "fn": 9, "focu": 14, "focus": [1, 6], "folder": 6, "follow": [1, 2, 3, 6, 9, 10, 12, 13, 14, 15, 18], "font": 6, "font_famili": 6, "foral": 10, "forc": 2, "forg": 3, "form": [4, 6, 18], "format": [7, 10, 12, 16, 17, 18], "forpost": [4, 6], "forum": 2, "fp16": 17, "frac": 10, "framework": [3, 14, 16, 18], "free": [1, 2, 14], "french": [6, 12, 14, 18], "friendli": 4, "from": [1, 4, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18], "from_hub": [8, 14], "from_imag": [7, 14, 15, 17], "from_pdf": 7, "from_url": 7, "full": [6, 10, 18], "function": [6, 9, 10, 15], "funsd": [4, 6, 16, 18], "further": 16, "futur": 6, "g": [7, 8], "g_": 10, "g_x": 10, "gamma": 9, "gaussian": 9, "gaussianblur": 9, "gaussiannois": 9, "gen": 18, "gender": 1, "gener": [2, 4, 7, 8], "generic_cyrillic_lett": 6, "geometri": [4, 7, 18], "geq": 10, "german": [6, 12, 14], "get": [17, 18], "git": 14, "github": [2, 3, 8, 14], "give": [1, 15], "given": [6, 7, 9, 10, 18], "global": 8, "go": 18, "good": 17, "googl": 2, "googlevis": 4, "gpu": [4, 15, 17], "gracefulli": 1, "graph": [4, 6, 7], "grayscal": 9, "ground": 10, "groung": 10, "group": [4, 18], "gt": 10, "gt_box": 10, "gt_label": 10, "guid": 2, "guidanc": 16, "gvision": 18, "h": [7, 8, 9], "h_": 10, "ha": [2, 6, 10, 16], "handl": [11, 16, 18], "handwrit": 6, "handwritten": 16, "harass": 1, "hardwar": 18, "harm": 1, "hat": 10, "have": [1, 2, 10, 12, 14, 16, 17, 18], "head": [8, 18], "healthi": 1, "hebrew": 6, "height": [7, 9], "hello": [10, 18], "help": 17, "here": [5, 9, 11, 15, 16, 18], "hf": 8, "hf_hub_download": 8, "high": 7, "higher": [3, 6, 18], "hindi": 6, "hindi_digit": 6, "hocr": 18, "hook": 18, "horizont": [7, 9, 18], "hous": 6, "how": [2, 11, 12, 14, 16], "howev": 16, "hsv": 9, "html": [1, 2, 3, 7, 18], "http": [1, 3, 6, 7, 8, 14, 18], "hub": 8, "hue": 9, "huggingfac": 8, "hw": 6, "i": [1, 2, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17], "i7": 18, "ic03": [4, 6, 16], "ic13": [4, 6, 16], "icdar": [4, 6], "icdar2019": 6, "id": 18, "ident": 1, "identifi": 4, "iiit": [4, 6], "iiit5k": [6, 16], "iiithw": [4, 6, 16], "imag": [4, 6, 7, 8, 9, 10, 14, 15, 16, 18], "imagenet": 8, "imageri": 1, "images_90k_norm": 6, "img": [6, 9, 16, 17], "img_cont": 7, "img_fold": [6, 16], "img_path": 7, "img_transform": 6, "imgur5k": [4, 6, 16], "imgur5k_annot": 6, "imlist": 6, "impact": 1, "implement": [6, 7, 8, 9, 10, 18], "import": [6, 7, 8, 9, 10, 12, 14, 15, 16, 17, 18], "improv": 8, "inappropri": 1, "incid": 1, "includ": [1, 6, 16, 17], "inclus": 1, "increas": 9, "independ": 9, "index": [2, 7], "indic": 10, "individu": 1, "infer": [4, 8, 9, 15, 18], "inform": [1, 2, 4, 6, 16], "input": [2, 7, 8, 9, 17, 18], "input_crop": 8, "input_pag": [8, 10, 18], "input_shap": 17, "input_tensor": 8, "inspir": [1, 9], "instal": [14, 15, 17], "instanc": [1, 18], "instanti": [8, 18], "instead": [6, 7, 8], "insult": 1, "int": [6, 7, 9], "int64": 10, "integ": 10, "integr": [4, 14, 16], "intel": 18, "interact": [1, 7, 10], "interfac": [14, 17], "interoper": 17, "interpol": 9, "interpret": [6, 7], "intersect": 10, "invert": 9, "investig": 1, "invis": 1, "involv": [1, 18], "io": [12, 14, 15, 17], "iou": 10, "iou_thresh": 10, "iou_threshold": 15, "irregular": [4, 8, 16], "isn": 6, "issu": [1, 2, 14], "italian": 6, "iter": [6, 9, 16, 18], "its": [7, 8, 9, 10, 16, 18], "itself": [8, 14], "j": 10, "job": 2, "join": 2, "jpeg": 9, "jpegqual": 9, "jpg": [6, 7, 14, 17], "json": [6, 16, 18], "json_output": 18, "jump": 2, "just": 1, "kei": [4, 6], "kera": [8, 17], "kernel": [4, 8, 9], "kernel_shap": 9, "keywoard": 8, "keyword": [6, 7, 8, 10], "kie": [8, 12], "kie_predictor": [8, 12], "kiepredictor": 8, "kind": 1, "know": [2, 17], "kwarg": [6, 7, 8, 10], "l": 10, "l_j": 10, "label": [6, 10, 15, 16], "label_fil": [6, 16], "label_fold": 6, "label_path": [6, 16], "labels_path": [6, 16], "ladder": 1, "lambda": 9, "lambdatransform": 9, "lang": 18, "languag": [1, 4, 6, 7, 8, 14, 18], "larg": [8, 14], "largest": 10, "last": [3, 6], "latenc": 8, "later": 2, "latest": 18, "latin": 6, "layer": 17, "layout": 18, "lead": 1, "leader": 1, "learn": [1, 4, 8, 17, 18], "least": 3, "left": [10, 18], "legacy_french": 6, "length": [6, 18], "less": [17, 18], "level": [1, 6, 10, 18], "leverag": 11, "lf": 14, "librari": [2, 3, 11, 12], "light": 4, "lightweight": 17, "like": 1, "limits_": 10, "line": [4, 8, 10, 18], "line_1_1": 18, "link": 12, "linknet": [4, 8], "linknet_resnet18": [8, 12, 17, 18], "linknet_resnet34": [8, 17, 18], "linknet_resnet50": [8, 18], "list": [6, 7, 9, 10, 14], "ll": 10, "load": [4, 6, 8, 15, 17], "load_state_dict": 12, "load_weight": 12, "loc_pr": 18, "local": [2, 4, 6, 8, 10, 16, 18], "localis": 6, "localizationconfus": 10, "locat": [2, 7, 18], "login": 8, "login_to_hub": [8, 14], "logo": [7, 15, 16], "love": 14, "lower": [9, 10, 18], "m": [2, 10, 18], "m1": 3, "macbook": 3, "machin": 17, "made": 4, "magc_resnet31": 8, "mai": [1, 2], "mail": 1, "main": 11, "maintain": 4, "mainten": 2, "make": [1, 2, 10, 12, 13, 14, 17, 18], "mani": [16, 18], "manipul": 18, "map": [6, 8], "map_loc": 12, "master": [4, 8, 18], "match": [10, 18], "mathcal": 10, "matplotlib": [7, 10], "max": [6, 9, 10], "max_angl": 9, "max_area": 9, "max_char": [6, 16], "max_delta": 9, "max_gain": 9, "max_gamma": 9, "max_qual": 9, "max_ratio": 9, "maximum": [6, 9], "maxval": [8, 9], "mbox": 10, "mean": [9, 10, 12], "meaniou": 10, "meant": [7, 17], "measur": 18, "media": 1, "median": 8, "meet": 12, "member": 1, "memori": [13, 17], "mention": 18, "merg": 6, "messag": 2, "meta": 18, "metadata": 17, "metal": 3, "method": [7, 9, 18], "metric": [10, 18], "middl": 18, "might": [17, 18], "min": 9, "min_area": 9, "min_char": [6, 16], "min_gain": 9, "min_gamma": 9, "min_qual": 9, "min_ratio": 9, "min_val": 9, "minde": [1, 3, 4, 8], "minim": [2, 4], "minimalist": [4, 8], "minimum": [3, 6, 9, 10, 18], "minval": 9, "miss": 3, "mistak": 1, "mixed_float16": 17, "mixed_precis": 17, "mjsynth": [4, 6, 16], "mnt": 6, "mobilenet": [8, 14], "mobilenet_v3_larg": 8, "mobilenet_v3_large_r": 8, "mobilenet_v3_smal": [8, 12], "mobilenet_v3_small_crop_orient": [8, 12], "mobilenet_v3_small_page_orient": [8, 12], "mobilenet_v3_small_r": 8, "mobilenetv3": 8, "modal": [4, 6], "mode": 3, "model": [6, 10, 13, 15, 16], "model_nam": [8, 14, 17], "model_path": [15, 17], "moder": 1, "modif": 2, "modifi": [8, 13, 18], "modul": [3, 7, 8, 9, 10, 18], "more": [2, 16, 18], "most": 18, "mozilla": 1, "multi": [4, 8], "multilingu": [6, 14], "multipl": [6, 7, 9, 18], "multipli": 9, "multiprocess": 13, "my": 8, "my_awesome_model": 14, "my_hook": 18, "n": [6, 10], "name": [6, 8, 17, 18], "nation": 1, "natur": [1, 4, 6], "ndarrai": [6, 7, 9, 10], "necessari": [3, 12, 13], "need": [2, 3, 6, 10, 12, 13, 14, 15, 18], "neg": 9, "nest": 18, "network": [4, 6, 8, 17], "neural": [4, 6, 8, 17], "new": [2, 10], "next": [6, 16], "nois": 9, "noisi": [4, 6], "non": [4, 6, 7, 8, 9, 10], "none": [6, 7, 8, 9, 10, 18], "normal": [8, 9], "norwegian": 6, "note": [0, 2, 6, 8, 12, 14, 15, 17], "now": 2, "np": [8, 9, 10, 18], "num_output_channel": 9, "num_sampl": [6, 16], "number": [6, 9, 10, 18], "numpi": [7, 8, 10, 18], "o": 3, "obb": 15, "obj_detect": 14, "object": [6, 7, 10, 15, 18], "objectness_scor": [7, 18], "oblig": 1, "obtain": 18, "occupi": 17, "ocr": [4, 6, 8, 10, 14], "ocr_carea": 18, "ocr_db_crnn": 10, "ocr_lin": 18, "ocr_pag": 18, "ocr_par": 18, "ocr_predictor": [8, 12, 14, 17, 18], "ocrdataset": [6, 16], "ocrmetr": 10, "ocrpredictor": [8, 12], "ocrx_word": 18, "offens": 1, "offici": [1, 8], "offlin": 1, "offset": 9, "onc": 18, "one": [2, 6, 8, 9, 12, 14, 18], "oneof": 9, "ones": [6, 10], "onli": [2, 8, 9, 10, 12, 14, 16, 17, 18], "onlin": 1, "onnx": 15, "onnxruntim": [15, 17], "onnxtr": 17, "opac": 9, "opacity_rang": 9, "open": [1, 2, 14, 17], "opinion": 1, "optic": [4, 18], "optim": [4, 18], "option": [6, 8, 12], "order": [2, 6, 7, 9], "org": [1, 6, 8, 18], "organ": 7, "orient": [1, 7, 8, 11, 15, 18], "orientationpredictor": 8, "other": [1, 2], "otherwis": [1, 7, 10], "our": [2, 8, 18], "out": [2, 8, 9, 10, 18], "outpout": 18, "output": [7, 9, 17], "output_s": [7, 9], "outsid": 13, "over": [6, 10, 18], "overal": [1, 8], "overlai": 7, "overview": 15, "overwrit": 12, "overwritten": 14, "own": 4, "p": [9, 18], "packag": [2, 4, 10, 13, 15, 16, 17], "pad": [6, 8, 9, 18], "page": [3, 6, 8, 10, 12, 18], "page1": 7, "page2": 7, "page_1": 18, "page_idx": [7, 18], "page_orientation_predictor": [8, 12], "page_param": 12, "pair": 10, "paper": 8, "par_1_1": 18, "paragraph": 18, "paragraph_break": 18, "param": [9, 18], "paramet": [4, 7, 8, 17], "pars": [4, 6], "parseq": [4, 8, 14, 17, 18], "part": [6, 9, 18], "parti": 3, "partial": 18, "particip": 1, "pass": [6, 7, 8, 12, 18], "password": 7, "patch": [8, 10], "path": [6, 7, 15, 16, 17], "path_to_checkpoint": 12, "path_to_custom_model": 17, "path_to_pt": 12, "pattern": 1, "pdf": [7, 8, 11], "pdfpage": 7, "peopl": 1, "per": [9, 18], "perform": [4, 7, 8, 9, 10, 13, 17, 18], "period": 1, "permiss": 1, "permut": [4, 8], "persian_lett": 6, "person": [1, 16], "phase": 18, "photo": 16, "physic": [1, 7], "pick": 9, "pictur": 7, "pip": [2, 3, 15, 17], "pipelin": 18, "pixel": [7, 9, 18], "pleas": 2, "plot": 10, "plt": 10, "plug": 14, "plugin": 3, "png": 7, "point": 17, "polici": 13, "polish": 6, "polit": 1, "polygon": [6, 10, 18], "pool": 8, "portugues": 6, "posit": [1, 10], "possibl": [2, 10, 14, 18], "post": [1, 18], "postprocessor": 18, "potenti": 8, "power": 4, "ppageno": 18, "pre": [2, 8, 17], "precis": [10, 18], "pred": 10, "pred_box": 10, "pred_label": 10, "predefin": 16, "predict": [7, 8, 10, 18], "predictor": [4, 7, 8, 11, 12, 14, 17], "prefer": 16, "preinstal": 3, "preprocessor": [12, 18], "prerequisit": 14, "present": 11, "preserv": [8, 9, 18], "preserve_aspect_ratio": [7, 8, 9, 12, 18], "pretrain": [4, 8, 10, 12, 17, 18], "pretrained_backbon": [8, 12], "print": 18, "prior": 6, "privaci": 1, "privat": 1, "probabl": 9, "problem": 2, "procedur": 9, "process": [2, 4, 7, 12, 18], "processor": 18, "produc": [11, 18], "product": 17, "profession": 1, "project": [2, 16], "promptli": 1, "proper": 2, "properli": 6, "provid": [1, 2, 4, 14, 15, 16, 18], "public": [1, 4], "publicli": 18, "publish": 1, "pull": 14, "punctuat": 6, "pure": 6, "purpos": 2, "push_to_hf_hub": [8, 14], "py": 14, "pypdfium2": [3, 7], "pyplot": [7, 10], "python": [2, 15], "python3": 14, "pytorch": [3, 4, 8, 9, 12, 14, 17, 18], "q": 2, "qr": [7, 15], "qr_code": 16, "qualiti": 9, "question": 1, "quickli": 4, "quicktour": 11, "r": 18, "race": 1, "ramdisk": 6, "rand": [8, 9, 10, 17, 18], "random": [8, 9, 10, 18], "randomappli": 9, "randombright": 9, "randomcontrast": 9, "randomcrop": 9, "randomgamma": 9, "randomhorizontalflip": 9, "randomhu": 9, "randomjpegqu": 9, "randomli": 9, "randomres": 9, "randomrot": 9, "randomsatur": 9, "randomshadow": 9, "rang": 9, "rassi": 14, "ratio": [8, 9, 18], "raw": [7, 10], "re": 17, "read": [4, 6, 8], "read_html": 7, "read_img_as_numpi": 7, "read_img_as_tensor": 7, "read_pdf": 7, "readi": 17, "real": [4, 8, 9], "reason": [1, 4, 6], "rebuild": 2, "rebuilt": 2, "recal": [10, 18], "receipt": [4, 6, 18], "reco_arch": [8, 12, 14, 17], "reco_b": 18, "reco_model": [12, 14, 17], "reco_param": 12, "reco_predictor": 12, "recogn": 18, "recognit": [6, 10, 11, 12], "recognition_predictor": [8, 18], "recognition_task": [6, 16], "recognitiondataset": [6, 16], "recognitionpredictor": [8, 12], "rectangular": 8, "reduc": [3, 9], "refer": [2, 3, 12, 14, 15, 16, 18], "regardless": 1, "region": 18, "regroup": 10, "regular": 16, "reject": 1, "rel": [7, 9, 10, 18], "relat": 7, "releas": [0, 3], "relev": 15, "religion": 1, "remov": 1, "render": [7, 18], "repo": 8, "repo_id": [8, 14], "report": 1, "repositori": [6, 8, 14], "repres": [1, 17, 18], "represent": [4, 8], "request": [1, 14], "requir": [3, 9, 17], "research": 4, "residu": 8, "resiz": [9, 18], "resnet": 8, "resnet18": [8, 14], "resnet31": 8, "resnet34": 8, "resnet50": [8, 14], "resolv": 7, "resolve_block": 18, "resolve_lin": 18, "resourc": 16, "respect": 1, "rest": [2, 9, 10], "restrict": 13, "result": [2, 6, 7, 11, 14, 17, 18], "return": 18, "reusabl": 18, "review": 1, "rgb": [7, 9], "rgb_mode": 7, "rgb_output": 7, "right": [1, 8, 10], "robust": [4, 6], "root": 6, "rotat": [6, 7, 8, 9, 10, 11, 12, 16, 18], "run": [2, 3, 8], "same": [2, 7, 10, 16, 17, 18], "sampl": [6, 16, 18], "sample_transform": 6, "sar": [4, 8], "sar_resnet31": [8, 18], "satur": 9, "save": [8, 16], "scale": [7, 8, 9, 10], "scale_rang": 9, "scan": [4, 6], "scene": [4, 6, 8], "score": [7, 10], "script": [2, 16], "seamless": 4, "seamlessli": [4, 18], "search": 8, "searchabl": 11, "sec": 18, "second": 18, "section": [12, 14, 15, 17, 18], "secur": [1, 13], "see": [1, 2], "seen": 18, "segment": [4, 8, 18], "self": 18, "semant": [4, 8], "send": 18, "sens": 10, "sensit": 16, "separ": 18, "sequenc": [4, 6, 7, 8, 10, 18], "sequenti": [9, 18], "seri": 1, "seriou": 1, "set": [1, 3, 6, 8, 10, 13, 15, 18], "set_global_polici": 17, "sever": [7, 9, 18], "sex": 1, "sexual": 1, "shade": 9, "shape": [4, 7, 8, 9, 10, 18], "share": [13, 16], "shift": 9, "shm": 13, "should": [2, 6, 7, 9, 10], "show": [4, 7, 8, 10, 12, 14, 15], "showcas": [2, 11], "shuffl": [6, 9], "side": 10, "signatur": 7, "signific": 16, "simpl": [4, 8, 17], "simpler": 8, "sinc": [6, 16], "singl": [1, 2, 4, 6], "single_img_doc": 17, "size": [1, 6, 7, 9, 15, 18], "skew": 18, "slack": 2, "slightli": 8, "small": [2, 8, 18], "smallest": 7, "snapshot_download": 8, "snippet": 18, "so": [2, 3, 6, 8, 14, 16], "social": 1, "socio": 1, "some": [3, 11, 14, 16], "someth": 2, "somewher": 2, "sort": 1, "sourc": [6, 7, 8, 9, 10, 14], "space": [1, 18], "span": 18, "spanish": 6, "spatial": [4, 6, 7], "specif": [2, 3, 10, 12, 16, 18], "specifi": [1, 6, 7], "speed": [4, 8, 18], "sphinx": 2, "sroie": [4, 6, 16], "stabl": 3, "stackoverflow": 2, "stage": 4, "standalon": 11, "standard": 9, "start": 6, "state": [4, 10, 15], "static": 10, "statu": 1, "std": [9, 12], "step": 13, "still": 18, "str": [6, 7, 8, 9, 10], "straight": [6, 8, 16, 18], "straighten": 18, "straighten_pag": [8, 12, 18], "straigten_pag": 12, "stream": 7, "street": [4, 6], "strict": 3, "strictli": 10, "string": [6, 7, 10, 18], "strive": 3, "strong": [4, 8], "structur": [17, 18], "subset": [6, 18], "suggest": [2, 14], "sum": 10, "summari": 10, "support": [3, 12, 15, 17, 18], "sustain": 1, "svhn": [4, 6, 16], "svt": [6, 16], "swedish": 6, "symmetr": [8, 9, 18], "symmetric_pad": [8, 9, 18], "synthet": 4, "synthtext": [4, 6, 16], "system": 18, "t": [2, 6, 12, 17, 18], "tabl": [14, 15, 16], "take": [1, 6, 18], "target": [6, 7, 9, 10, 16], "target_s": 6, "task": [4, 6, 8, 14, 16, 18], "task2": 6, "team": 3, "techminde": 3, "templat": [2, 4], "tensor": [6, 7, 9, 18], "tensorflow": [3, 4, 7, 8, 9, 12, 14, 17, 18], "tensorspec": 17, "term": 1, "test": [6, 16], "test_set": 6, "text": [6, 7, 8, 10, 16], "text_output": 18, "textmatch": 10, "textnet": 8, "textnet_bas": 8, "textnet_smal": 8, "textnet_tini": 8, "textract": [4, 18], "textstylebrush": [4, 6], "textual": [4, 6, 7, 8, 18], "tf": [3, 7, 8, 9, 14, 17], "than": [2, 10, 14], "thank": 2, "thei": [1, 10], "them": [6, 18], "thi": [1, 2, 3, 5, 6, 9, 10, 12, 13, 14, 16, 17, 18], "thing": [17, 18], "third": 3, "those": [1, 7, 18], "threaten": 1, "threshold": 18, "through": [1, 9, 15, 16], "tilman": 14, "time": [1, 4, 8, 10, 16], "tini": 8, "titl": [7, 18], "tm": 18, "tmp": 13, "togeth": [2, 7], "tograi": 9, "tool": 16, "top": [10, 17, 18], "topic": 2, "torch": [3, 9, 12, 14, 17], "torchvis": 9, "total": 12, "toward": [1, 3], "train": [2, 6, 8, 9, 14, 15, 16, 17, 18], "train_it": [6, 16], "train_load": [6, 16], "train_pytorch": 14, "train_set": [6, 16], "train_tensorflow": 14, "trainabl": [4, 8], "tranform": 9, "transcrib": 18, "transfer": [4, 6], "transfo": 9, "transform": [4, 6, 8], "translat": 1, "troll": 1, "true": [6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18], "truth": 10, "tune": 17, "tupl": [6, 7, 9, 10], "two": [7, 13], "txt": 6, "type": [7, 10, 14, 17, 18], "typic": 18, "u": [1, 2], "ucsd": 6, "udac": 2, "uint8": [7, 8, 10, 18], "ukrainian": 6, "unaccept": 1, "underli": [16, 18], "underneath": 7, "understand": [4, 6, 18], "uniform": [8, 9], "uniformli": 9, "uninterrupt": [7, 18], "union": 10, "unittest": 2, "unlock": 7, "unoffici": 8, "unprofession": 1, "unsolicit": 1, "unsupervis": 4, "unwelcom": 1, "up": [8, 18], "updat": 10, "upgrad": 2, "upper": [6, 9], "uppercas": 16, "url": 7, "us": [1, 2, 3, 6, 8, 10, 11, 12, 13, 14, 15, 18], "usabl": 18, "usag": [13, 17], "use_polygon": [6, 10, 16], "useabl": 18, "user": [4, 7, 11], "utf": 18, "util": 17, "v1": 14, "v3": [8, 14, 18], "valid": 16, "valu": [2, 7, 9, 18], "valuabl": 4, "variabl": 13, "varieti": 6, "veri": 8, "version": [1, 2, 3, 17, 18], "vgg": 8, "vgg16": 14, "vgg16_bn_r": 8, "via": 1, "vietnames": 6, "view": [4, 6], "viewpoint": 1, "violat": 1, "visibl": 1, "vision": [4, 6, 8], "visiondataset": 6, "visiontransform": 8, "visual": [3, 4, 15], "visualize_pag": 10, "vit_": 8, "vit_b": 8, "vitstr": [4, 8, 17], "vitstr_bas": [8, 18], "vitstr_smal": [8, 12, 17, 18], "viz": 3, "vocab": [12, 14, 16, 17, 18], "vocabulari": [6, 12, 14], "w": [7, 8, 9, 10], "w3": 18, "wa": 1, "wai": [1, 4, 16], "want": [2, 17, 18], "warmup": 18, "wasn": 2, "we": [1, 2, 3, 4, 7, 9, 12, 14, 16, 17, 18], "weasyprint": 7, "web": [2, 7], "websit": 6, "welcom": 1, "well": [1, 17], "were": [1, 7, 18], "what": 1, "when": [1, 2, 8], "whenev": 2, "where": [2, 7, 9, 10], "whether": [2, 6, 7, 9, 10, 16, 18], "which": [1, 8, 13, 15, 16, 18], "whichev": 3, "while": [9, 18], "why": 1, "width": [7, 9], "wiki": 1, "wildreceipt": [4, 6, 16], "window": [8, 10], "wish": 2, "within": 1, "without": [1, 6, 8], "wonder": 2, "word": [4, 6, 8, 10, 18], "word_1_1": 18, "word_1_2": 18, "word_1_3": 18, "wordgener": [6, 16], "words_onli": 10, "work": [12, 13, 18], "workflow": 2, "worklow": 2, "world": [10, 18], "worth": 8, "wrap": 18, "wrapper": [6, 9], "write": 13, "written": [1, 7], "www": [1, 7, 18], "x": [7, 9, 10], "x_ascend": 18, "x_descend": 18, "x_i": 10, "x_size": 18, "x_wconf": 18, "xhtml": 18, "xmax": 7, "xmin": 7, "xml": 18, "xml_bytes_str": 18, "xml_element": 18, "xml_output": 18, "xmln": 18, "y": 10, "y_i": 10, "y_j": 10, "yet": 15, "ymax": 7, "ymin": 7, "yolov8": 15, "you": [2, 3, 6, 7, 8, 12, 13, 14, 15, 16, 17, 18], "your": [2, 4, 7, 10, 18], "yoursit": 7, "zero": [9, 10], "zoo": 12, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 6, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 6, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 6, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 6, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 6, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 6, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 6, "\u00e4\u00f6\u00e4\u00f6": 6, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 6, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 6, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 6, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 6, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 6, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 6, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0905": 6, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 6, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 6, "\u0950": 6, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 6, "\u09bd": 6, "\u09ce": 6, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 6}, "titles": ["Changelog", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 2, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 1], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 1], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 1], "31": 0, "4": [0, 1], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 18, "approach": 18, "architectur": 18, "arg": [6, 7, 8, 9, 10], "artefact": 7, "artefactdetect": 15, "attribut": 1, "avail": [15, 16, 18], "aw": 13, "ban": 1, "block": 7, "bug": 2, "changelog": 0, "choos": [16, 18], "classif": [8, 12, 14], "code": [1, 2], "codebas": 2, "commit": 2, "commun": 14, "compos": 9, "conda": 3, "conduct": 1, "connect": 2, "continu": 2, "contrib": 5, "contribut": [2, 5, 15], "contributor": 1, "convent": 14, "correct": 1, "coven": 1, "custom": [6, 12], "data": 16, "dataload": 6, "dataset": [4, 6, 16], "detect": [4, 8, 14, 16, 18], "develop": 2, "do": 18, "doctr": [2, 4, 5, 6, 7, 8, 9, 10, 11], "document": [2, 4, 7], "end": 18, "enforc": 1, "evalu": 10, "export": 17, "factori": 8, "featur": [2, 4], "feedback": 2, "file": 7, "from": 14, "gener": [6, 16], "git": 3, "guidelin": 1, "half": 17, "hub": 14, "huggingfac": 14, "i": 18, "infer": 17, "instal": [2, 3], "integr": [2, 15], "io": 7, "lambda": 13, "let": 2, "line": 7, "linux": 3, "load": [12, 14, 16], "loader": 6, "main": 4, "mode": 2, "model": [4, 8, 12, 14, 17, 18], "modifi": 2, "modul": [5, 15], "name": 14, "notebook": 11, "object": 16, "ocr": [16, 18], "onli": 3, "onnx": 17, "optim": 17, "option": 18, "orient": 12, "our": 1, "output": 18, "own": [12, 16], "packag": 3, "page": 7, "perman": 1, "pipelin": 15, "pledg": 1, "precis": 17, "predictor": 18, "prepar": 17, "prerequisit": 3, "pretrain": 14, "push": 14, "python": 3, "qualiti": 2, "question": 2, "read": 7, "readi": 16, "recognit": [4, 8, 14, 16, 18], "report": 2, "request": 2, "respons": 1, "return": [6, 7, 8, 10], "right": 18, "scope": 1, "share": 14, "should": 18, "stage": 18, "standard": 1, "structur": [2, 7], "style": 2, "support": [4, 5, 6, 9], "synthet": [6, 16], "task": 10, "temporari": 1, "test": 2, "text": [4, 18], "train": 12, "transform": 9, "two": 18, "unit": 2, "us": [16, 17], "util": 10, "v0": 0, "verif": 2, "via": 3, "visual": 10, "vocab": 6, "warn": 1, "what": 18, "word": 7, "your": [12, 14, 15, 16, 17], "zoo": [4, 8]}})
\ No newline at end of file
diff --git a/using_doctr/custom_models_training.html b/using_doctr/custom_models_training.html
index 580b4368b7..e664c6a950 100644
--- a/using_doctr/custom_models_training.html
+++ b/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -615,7 +615,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/using_doctr/running_on_aws.html b/using_doctr/running_on_aws.html
index ddb0c3c80f..81c38b49f5 100644
--- a/using_doctr/running_on_aws.html
+++ b/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -358,7 +358,7 @@ AWS Lambda
-
+
diff --git a/using_doctr/sharing_models.html b/using_doctr/sharing_models.html
index 07a3b2f2a3..4f5d1d68a5 100644
--- a/using_doctr/sharing_models.html
+++ b/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -540,7 +540,7 @@ Recognition
-
+
diff --git a/using_doctr/using_contrib_modules.html b/using_doctr/using_contrib_modules.html
index b4a10925e6..cf282ff3a4 100644
--- a/using_doctr/using_contrib_modules.html
+++ b/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -411,7 +411,7 @@ ArtefactDetection
-
+
diff --git a/using_doctr/using_datasets.html b/using_doctr/using_datasets.html
index 4a52df36ba..e30b6d6459 100644
--- a/using_doctr/using_datasets.html
+++ b/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -638,7 +638,7 @@ Data Loading
-
+
diff --git a/using_doctr/using_model_export.html b/using_doctr/using_model_export.html
index 2b30ee63a1..ad9d09ed4c 100644
--- a/using_doctr/using_model_export.html
+++ b/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -463,7 +463,7 @@ Using your ONNX exported model
-
+
diff --git a/using_doctr/using_models.html b/using_doctr/using_models.html
index 13cb06116b..5c80dbf62d 100644
--- a/using_doctr/using_models.html
+++ b/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1249,7 +1249,7 @@ Advanced options
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/cord.html b/v0.1.0/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.0/_modules/doctr/datasets/cord.html
+++ b/v0.1.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/detection.html b/v0.1.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.0/_modules/doctr/datasets/detection.html
+++ b/v0.1.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/doc_artefacts.html b/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/funsd.html b/v0.1.0/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.0/_modules/doctr/datasets/funsd.html
+++ b/v0.1.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ic03.html b/v0.1.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.0/_modules/doctr/datasets/ic03.html
+++ b/v0.1.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ic13.html b/v0.1.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.0/_modules/doctr/datasets/ic13.html
+++ b/v0.1.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/iiit5k.html b/v0.1.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/iiithws.html b/v0.1.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/imgur5k.html b/v0.1.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/loader.html b/v0.1.0/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.0/_modules/doctr/datasets/loader.html
+++ b/v0.1.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/mjsynth.html b/v0.1.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ocr.html b/v0.1.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.0/_modules/doctr/datasets/ocr.html
+++ b/v0.1.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/recognition.html b/v0.1.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.0/_modules/doctr/datasets/recognition.html
+++ b/v0.1.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/sroie.html b/v0.1.0/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.0/_modules/doctr/datasets/sroie.html
+++ b/v0.1.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/svhn.html b/v0.1.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.0/_modules/doctr/datasets/svhn.html
+++ b/v0.1.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/svt.html b/v0.1.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.0/_modules/doctr/datasets/svt.html
+++ b/v0.1.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/synthtext.html b/v0.1.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/utils.html b/v0.1.0/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.0/_modules/doctr/datasets/utils.html
+++ b/v0.1.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/wildreceipt.html b/v0.1.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.0/_modules/doctr/io/elements.html b/v0.1.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.0/_modules/doctr/io/elements.html
+++ b/v0.1.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.0/_modules/doctr/io/html.html b/v0.1.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.0/_modules/doctr/io/html.html
+++ b/v0.1.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.0/_modules/doctr/io/image/base.html b/v0.1.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.0/_modules/doctr/io/image/base.html
+++ b/v0.1.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.0/_modules/doctr/io/image/tensorflow.html b/v0.1.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/io/pdf.html b/v0.1.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.0/_modules/doctr/io/pdf.html
+++ b/v0.1.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.0/_modules/doctr/io/reader.html b/v0.1.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.0/_modules/doctr/io/reader.html
+++ b/v0.1.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/zoo.html b/v0.1.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/zoo.html b/v0.1.0/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/models/factory/hub.html b/v0.1.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.0/_modules/doctr/models/factory/hub.html
+++ b/v0.1.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/zoo.html b/v0.1.0/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/models/zoo.html b/v0.1.0/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.0/_modules/doctr/models/zoo.html
+++ b/v0.1.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/transforms/modules/base.html b/v0.1.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/utils/metrics.html b/v0.1.0/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.0/_modules/doctr/utils/metrics.html
+++ b/v0.1.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.0/_modules/doctr/utils/visualization.html b/v0.1.0/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.0/_modules/doctr/utils/visualization.html
+++ b/v0.1.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.0/_modules/index.html b/v0.1.0/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.0/_modules/index.html
+++ b/v0.1.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.0/_sources/getting_started/installing.rst.txt b/v0.1.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.0/_sources/getting_started/installing.rst.txt
+++ b/v0.1.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.0/_static/basic.css b/v0.1.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.0/_static/basic.css
+++ b/v0.1.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.0/_static/doctools.js b/v0.1.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.0/_static/doctools.js
+++ b/v0.1.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.0/_static/language_data.js b/v0.1.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.0/_static/language_data.js
+++ b/v0.1.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.0/_static/searchtools.js b/v0.1.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.0/_static/searchtools.js
+++ b/v0.1.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.0/changelog.html b/v0.1.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.0/changelog.html
+++ b/v0.1.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.0/community/resources.html b/v0.1.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.0/community/resources.html
+++ b/v0.1.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.0/contributing/code_of_conduct.html b/v0.1.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.0/contributing/code_of_conduct.html
+++ b/v0.1.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.0/contributing/contributing.html b/v0.1.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.0/contributing/contributing.html
+++ b/v0.1.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.0/genindex.html b/v0.1.0/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.0/genindex.html
+++ b/v0.1.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.0/getting_started/installing.html b/v0.1.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.0/getting_started/installing.html
+++ b/v0.1.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.0/index.html b/v0.1.0/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.0/index.html
+++ b/v0.1.0/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.0/modules/contrib.html b/v0.1.0/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.0/modules/contrib.html
+++ b/v0.1.0/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.0/modules/datasets.html b/v0.1.0/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.0/modules/datasets.html
+++ b/v0.1.0/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.0/modules/io.html b/v0.1.0/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.0/modules/io.html
+++ b/v0.1.0/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.0/modules/models.html b/v0.1.0/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.0/modules/models.html
+++ b/v0.1.0/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.0/modules/transforms.html b/v0.1.0/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.0/modules/transforms.html
+++ b/v0.1.0/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.0/modules/utils.html b/v0.1.0/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.0/modules/utils.html
+++ b/v0.1.0/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.0/notebooks.html b/v0.1.0/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.0/notebooks.html
+++ b/v0.1.0/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.0/search.html b/v0.1.0/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.0/search.html
+++ b/v0.1.0/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.0/searchindex.js b/v0.1.0/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.0/searchindex.js
+++ b/v0.1.0/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.0/using_doctr/custom_models_training.html b/v0.1.0/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.0/using_doctr/custom_models_training.html
+++ b/v0.1.0/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.0/using_doctr/running_on_aws.html b/v0.1.0/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.0/using_doctr/running_on_aws.html
+++ b/v0.1.0/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.0/using_doctr/sharing_models.html b/v0.1.0/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.0/using_doctr/sharing_models.html
+++ b/v0.1.0/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.0/using_doctr/using_contrib_modules.html b/v0.1.0/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.0/using_doctr/using_contrib_modules.html
+++ b/v0.1.0/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.0/using_doctr/using_datasets.html b/v0.1.0/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.0/using_doctr/using_datasets.html
+++ b/v0.1.0/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.0/using_doctr/using_model_export.html b/v0.1.0/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.0/using_doctr/using_model_export.html
+++ b/v0.1.0/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.0/using_doctr/using_models.html b/v0.1.0/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.0/using_doctr/using_models.html
+++ b/v0.1.0/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/cord.html b/v0.1.1/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.1/_modules/doctr/datasets/cord.html
+++ b/v0.1.1/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/detection.html b/v0.1.1/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.1/_modules/doctr/datasets/detection.html
+++ b/v0.1.1/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/funsd.html b/v0.1.1/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.1/_modules/doctr/datasets/funsd.html
+++ b/v0.1.1/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic03.html b/v0.1.1/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.1/_modules/doctr/datasets/ic03.html
+++ b/v0.1.1/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic13.html b/v0.1.1/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.1/_modules/doctr/datasets/ic13.html
+++ b/v0.1.1/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiit5k.html b/v0.1.1/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.1/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.1/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiithws.html b/v0.1.1/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.1/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.1/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/imgur5k.html b/v0.1.1/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.1/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.1/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/loader.html b/v0.1.1/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.1/_modules/doctr/datasets/loader.html
+++ b/v0.1.1/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/mjsynth.html b/v0.1.1/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.1/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.1/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ocr.html b/v0.1.1/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.1/_modules/doctr/datasets/ocr.html
+++ b/v0.1.1/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/recognition.html b/v0.1.1/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.1/_modules/doctr/datasets/recognition.html
+++ b/v0.1.1/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/sroie.html b/v0.1.1/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.1/_modules/doctr/datasets/sroie.html
+++ b/v0.1.1/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svhn.html b/v0.1.1/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.1/_modules/doctr/datasets/svhn.html
+++ b/v0.1.1/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svt.html b/v0.1.1/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.1/_modules/doctr/datasets/svt.html
+++ b/v0.1.1/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/synthtext.html b/v0.1.1/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.1/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.1/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/utils.html b/v0.1.1/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.1/_modules/doctr/datasets/utils.html
+++ b/v0.1.1/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/wildreceipt.html b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.1/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.1/_modules/doctr/io/elements.html b/v0.1.1/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.1/_modules/doctr/io/elements.html
+++ b/v0.1.1/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.1/_modules/doctr/io/html.html b/v0.1.1/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.1/_modules/doctr/io/html.html
+++ b/v0.1.1/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/base.html b/v0.1.1/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.1/_modules/doctr/io/image/base.html
+++ b/v0.1.1/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/tensorflow.html b/v0.1.1/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.1/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.1/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/io/pdf.html b/v0.1.1/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.1/_modules/doctr/io/pdf.html
+++ b/v0.1.1/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.1/_modules/doctr/io/reader.html b/v0.1.1/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.1/_modules/doctr/io/reader.html
+++ b/v0.1.1/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/zoo.html b/v0.1.1/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.1/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.1/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/zoo.html b/v0.1.1/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.1/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.1/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/factory/hub.html b/v0.1.1/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.1/_modules/doctr/models/factory/hub.html
+++ b/v0.1.1/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/zoo.html b/v0.1.1/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.1/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.1/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/zoo.html b/v0.1.1/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.1/_modules/doctr/models/zoo.html
+++ b/v0.1.1/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/base.html b/v0.1.1/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/utils/metrics.html b/v0.1.1/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.1/_modules/doctr/utils/metrics.html
+++ b/v0.1.1/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.1/_modules/doctr/utils/visualization.html b/v0.1.1/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.1/_modules/doctr/utils/visualization.html
+++ b/v0.1.1/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.1/_modules/index.html b/v0.1.1/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.1/_modules/index.html
+++ b/v0.1.1/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.1/_sources/getting_started/installing.rst.txt b/v0.1.1/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.1/_sources/getting_started/installing.rst.txt
+++ b/v0.1.1/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.1/_static/basic.css b/v0.1.1/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.1/_static/basic.css
+++ b/v0.1.1/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.1/_static/doctools.js b/v0.1.1/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.1/_static/doctools.js
+++ b/v0.1.1/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.1/_static/language_data.js b/v0.1.1/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.1/_static/language_data.js
+++ b/v0.1.1/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.1/_static/searchtools.js b/v0.1.1/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.1/_static/searchtools.js
+++ b/v0.1.1/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.1/changelog.html b/v0.1.1/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.1/changelog.html
+++ b/v0.1.1/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.1/community/resources.html b/v0.1.1/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.1/community/resources.html
+++ b/v0.1.1/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.1/contributing/code_of_conduct.html b/v0.1.1/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.1/contributing/code_of_conduct.html
+++ b/v0.1.1/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.1/contributing/contributing.html b/v0.1.1/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.1/contributing/contributing.html
+++ b/v0.1.1/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.1/genindex.html b/v0.1.1/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.1/genindex.html
+++ b/v0.1.1/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.1/getting_started/installing.html b/v0.1.1/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.1/getting_started/installing.html
+++ b/v0.1.1/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.1/index.html b/v0.1.1/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.1/index.html
+++ b/v0.1.1/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.1/modules/contrib.html b/v0.1.1/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.1/modules/contrib.html
+++ b/v0.1.1/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.1/modules/datasets.html b/v0.1.1/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.1/modules/datasets.html
+++ b/v0.1.1/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.1/modules/io.html b/v0.1.1/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.1/modules/io.html
+++ b/v0.1.1/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.1/modules/models.html b/v0.1.1/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.1/modules/models.html
+++ b/v0.1.1/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.1/modules/transforms.html b/v0.1.1/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.1/modules/transforms.html
+++ b/v0.1.1/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
ArtefactDetection
-
+
diff --git a/latest/using_doctr/using_datasets.html b/latest/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/latest/using_doctr/using_datasets.html
+++ b/latest/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/latest/using_doctr/using_model_export.html b/latest/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/latest/using_doctr/using_model_export.html
+++ b/latest/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/latest/using_doctr/using_models.html b/latest/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/latest/using_doctr/using_models.html
+++ b/latest/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/modules/contrib.html b/modules/contrib.html
index 22b0c508a6..b8878635b6 100644
--- a/modules/contrib.html
+++ b/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -376,7 +376,7 @@ Supported contribution modules
-
+
diff --git a/modules/datasets.html b/modules/datasets.html
index 0fe4b78d48..dfcacbc96e 100644
--- a/modules/datasets.html
+++ b/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1077,7 +1077,7 @@ Returns:
-
+
diff --git a/modules/io.html b/modules/io.html
index 924d292c59..77e9e017bf 100644
--- a/modules/io.html
+++ b/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -756,7 +756,7 @@ Returns:¶
-
+
diff --git a/modules/models.html b/modules/models.html
index bf45d11a71..f4a9833365 100644
--- a/modules/models.html
+++ b/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1598,7 +1598,7 @@ Args:¶
-
+
diff --git a/modules/transforms.html b/modules/transforms.html
index 6d77d16e7b..bc254c867b 100644
--- a/modules/transforms.html
+++ b/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -831,7 +831,7 @@ Args:¶<
-
+
diff --git a/modules/utils.html b/modules/utils.html
index 3dd3ecbd96..6784d81f6f 100644
--- a/modules/utils.html
+++ b/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -711,7 +711,7 @@ Args:¶
-
+
diff --git a/notebooks.html b/notebooks.html
index f3ea994e49..647f73d4eb 100644
--- a/notebooks.html
+++ b/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -387,7 +387,7 @@ docTR Notebooks
-
+
diff --git a/search.html b/search.html
index f0693e2c97..0e0da5efb3 100644
--- a/search.html
+++ b/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -336,7 +336,7 @@
-
+
diff --git a/searchindex.js b/searchindex.js
index 8598997441..df18967072 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[1, "correction"]], "2. Warning": [[1, "warning"]], "3. Temporary Ban": [[1, "temporary-ban"]], "4. Permanent Ban": [[1, "permanent-ban"]], "AWS Lambda": [[13, null]], "Advanced options": [[18, "advanced-options"]], "Args:": [[6, "args"], [6, "id4"], [6, "id7"], [6, "id10"], [6, "id13"], [6, "id16"], [6, "id19"], [6, "id22"], [6, "id25"], [6, "id29"], [6, "id32"], [6, "id37"], [6, "id40"], [6, "id46"], [6, "id49"], [6, "id50"], [6, "id51"], [6, "id54"], [6, "id57"], [6, "id60"], [6, "id61"], [7, "args"], [7, "id2"], [7, "id3"], [7, "id4"], [7, "id5"], [7, "id6"], [7, "id7"], [7, "id10"], [7, "id12"], [7, "id14"], [7, "id16"], [7, "id20"], [7, "id24"], [7, "id28"], [8, "args"], [8, "id3"], [8, "id8"], [8, "id13"], [8, "id17"], [8, "id21"], [8, "id26"], [8, "id31"], [8, "id36"], [8, "id41"], [8, "id46"], [8, "id50"], [8, "id54"], [8, "id59"], [8, "id63"], [8, "id68"], [8, "id73"], [8, "id77"], [8, "id81"], [8, "id85"], [8, "id90"], [8, "id95"], [8, "id99"], [8, "id104"], [8, "id109"], [8, "id114"], [8, "id119"], [8, "id123"], [8, "id127"], [8, "id132"], [8, "id137"], [8, "id142"], [8, "id146"], [8, "id150"], [8, "id155"], [8, "id159"], [8, "id163"], [8, "id167"], [8, "id169"], [8, "id171"], [8, "id173"], [9, "args"], [9, "id1"], [9, "id2"], [9, "id3"], [9, "id4"], [9, "id5"], [9, "id6"], [9, "id7"], [9, "id8"], [9, "id9"], [9, "id10"], [9, "id11"], [9, "id12"], [9, "id13"], [9, "id14"], [9, "id15"], [9, "id16"], [9, "id17"], [9, "id18"], [9, "id19"], [10, "args"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"]], "Artefact": [[7, "artefact"]], "ArtefactDetection": [[15, "artefactdetection"]], "Attribution": [[1, "attribution"]], "Available Datasets": [[16, "available-datasets"]], "Available architectures": [[18, "available-architectures"], [18, "id1"], [18, "id2"]], "Available contribution modules": [[15, "available-contribution-modules"]], "Block": [[7, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[16, null]], "Choosing the right model": [[18, null]], "Classification": [[14, "classification"]], "Code quality": [[2, "code-quality"]], "Code style verification": [[2, "code-style-verification"]], "Codebase structure": [[2, "codebase-structure"]], "Commits": [[2, "commits"]], "Composing transformations": [[9, "composing-transformations"]], "Continuous Integration": [[2, "continuous-integration"]], "Contributing to docTR": [[2, null]], "Contributor Covenant Code of Conduct": [[1, null]], "Custom dataset loader": [[6, "custom-dataset-loader"]], "Custom orientation classification models": [[12, "custom-orientation-classification-models"]], "Data Loading": [[16, "data-loading"]], "Dataloader": [[6, "dataloader"]], "Detection": [[14, "detection"], [16, "detection"]], "Detection predictors": [[18, "detection-predictors"]], "Developer mode installation": [[2, "developer-mode-installation"]], "Developing docTR": [[2, "developing-doctr"]], "Document": [[7, "document"]], "Document structure": [[7, "document-structure"]], "End-to-End OCR": [[18, "end-to-end-ocr"]], "Enforcement": [[1, "enforcement"]], "Enforcement Guidelines": [[1, "enforcement-guidelines"]], "Enforcement Responsibilities": [[1, "enforcement-responsibilities"]], "Export to ONNX": [[17, "export-to-onnx"]], "Feature requests & bug report": [[2, "feature-requests-bug-report"]], "Feedback": [[2, "feedback"]], "File reading": [[7, "file-reading"]], "Half-precision": [[17, "half-precision"]], "Installation": [[3, null]], "Integrate contributions into your pipeline": [[15, null]], "Let\u2019s connect": [[2, "let-s-connect"]], "Line": [[7, "line"]], "Loading from Huggingface Hub": [[14, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[12, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[12, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[4, "main-features"]], "Model optimization": [[17, "model-optimization"]], "Model zoo": [[4, "model-zoo"]], "Modifying the documentation": [[2, "modifying-the-documentation"]], "Naming conventions": [[14, "naming-conventions"]], "OCR": [[16, "ocr"]], "Object Detection": [[16, "object-detection"]], "Our Pledge": [[1, "our-pledge"]], "Our Standards": [[1, "our-standards"]], "Page": [[7, "page"]], "Preparing your model for inference": [[17, null]], "Prerequisites": [[3, "prerequisites"]], "Pretrained community models": [[14, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[14, "pushing-to-the-huggingface-hub"]], "Questions": [[2, "questions"]], "Recognition": [[14, "recognition"], [16, "recognition"]], "Recognition predictors": [[18, "recognition-predictors"]], "Returns:": [[6, "returns"], [7, "returns"], [7, "id11"], [7, "id13"], [7, "id15"], [7, "id19"], [7, "id23"], [7, "id27"], [7, "id31"], [8, "returns"], [8, "id6"], [8, "id11"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id29"], [8, "id34"], [8, "id39"], [8, "id44"], [8, "id49"], [8, "id53"], [8, "id57"], [8, "id62"], [8, "id66"], [8, "id71"], [8, "id76"], [8, "id80"], [8, "id84"], [8, "id88"], [8, "id93"], [8, "id98"], [8, "id102"], [8, "id107"], [8, "id112"], [8, "id117"], [8, "id122"], [8, "id126"], [8, "id130"], [8, "id135"], [8, "id140"], [8, "id145"], [8, "id149"], [8, "id153"], [8, "id158"], [8, "id162"], [8, "id166"], [8, "id168"], [8, "id170"], [8, "id172"], [10, "returns"]], "Scope": [[1, "scope"]], "Share your model with the community": [[14, null]], "Supported Vocabs": [[6, "supported-vocabs"]], "Supported contribution modules": [[5, "supported-contribution-modules"]], "Supported datasets": [[4, "supported-datasets"]], "Supported transformations": [[9, "supported-transformations"]], "Synthetic dataset generator": [[6, "synthetic-dataset-generator"], [16, "synthetic-dataset-generator"]], "Task evaluation": [[10, "task-evaluation"]], "Text Detection": [[18, "text-detection"]], "Text Recognition": [[18, "text-recognition"]], "Text detection models": [[4, "text-detection-models"]], "Text recognition models": [[4, "text-recognition-models"]], "Train your own model": [[12, null]], "Two-stage approaches": [[18, "two-stage-approaches"]], "Unit tests": [[2, "unit-tests"]], "Use your own datasets": [[16, "use-your-own-datasets"]], "Using your ONNX exported model": [[17, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[3, "via-conda-only-for-linux"]], "Via Git": [[3, "via-git"]], "Via Python Package": [[3, "via-python-package"]], "Visualization": [[10, "visualization"]], "What should I do with the output?": [[18, "what-should-i-do-with-the-output"]], "Word": [[7, "word"]], "docTR Notebooks": [[11, null]], "docTR Vocabs": [[6, "id62"]], "docTR: Document Text Recognition": [[4, null]], "doctr.contrib": [[5, null]], "doctr.datasets": [[6, null], [6, "datasets"]], "doctr.io": [[7, null]], "doctr.models": [[8, null]], "doctr.models.classification": [[8, "doctr-models-classification"]], "doctr.models.detection": [[8, "doctr-models-detection"]], "doctr.models.factory": [[8, "doctr-models-factory"]], "doctr.models.recognition": [[8, "doctr-models-recognition"]], "doctr.models.zoo": [[8, "doctr-models-zoo"]], "doctr.transforms": [[9, null]], "doctr.utils": [[10, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[7, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[7, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[9, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[6, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[9, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[9, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[6, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[6, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[8, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[6, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[6, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[7, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[7, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[6, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[6, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[9, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[9, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[6, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[6, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[6, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[6, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[6, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[8, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[9, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[7, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[6, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[9, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[8, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[6, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[9, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[7, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[9, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[9, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[9, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[9, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[9, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[9, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[9, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[9, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[9, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[9, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[9, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[9, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[7, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[7, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[7, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[6, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[9, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[7, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[7, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[6, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[6, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[6, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[6, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[9, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[10, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[6, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[7, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[6, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[6, 0, 1, "", "CORD"], [6, 0, 1, "", "CharacterGenerator"], [6, 0, 1, "", "DetectionDataset"], [6, 0, 1, "", "DocArtefacts"], [6, 0, 1, "", "FUNSD"], [6, 0, 1, "", "IC03"], [6, 0, 1, "", "IC13"], [6, 0, 1, "", "IIIT5K"], [6, 0, 1, "", "IIITHWS"], [6, 0, 1, "", "IMGUR5K"], [6, 0, 1, "", "MJSynth"], [6, 0, 1, "", "OCRDataset"], [6, 0, 1, "", "RecognitionDataset"], [6, 0, 1, "", "SROIE"], [6, 0, 1, "", "SVHN"], [6, 0, 1, "", "SVT"], [6, 0, 1, "", "SynthText"], [6, 0, 1, "", "WILDRECEIPT"], [6, 0, 1, "", "WordGenerator"], [6, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[6, 0, 1, "", "DataLoader"]], "doctr.io": [[7, 0, 1, "", "Artefact"], [7, 0, 1, "", "Block"], [7, 0, 1, "", "Document"], [7, 0, 1, "", "DocumentFile"], [7, 0, 1, "", "Line"], [7, 0, 1, "", "Page"], [7, 0, 1, "", "Word"], [7, 1, 1, "", "decode_img_as_tensor"], [7, 1, 1, "", "read_html"], [7, 1, 1, "", "read_img_as_numpy"], [7, 1, 1, "", "read_img_as_tensor"], [7, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[7, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[7, 2, 1, "", "from_images"], [7, 2, 1, "", "from_pdf"], [7, 2, 1, "", "from_url"]], "doctr.io.Page": [[7, 2, 1, "", "show"]], "doctr.models": [[8, 1, 1, "", "kie_predictor"], [8, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[8, 1, 1, "", "crop_orientation_predictor"], [8, 1, 1, "", "magc_resnet31"], [8, 1, 1, "", "mobilenet_v3_large"], [8, 1, 1, "", "mobilenet_v3_large_r"], [8, 1, 1, "", "mobilenet_v3_small"], [8, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [8, 1, 1, "", "mobilenet_v3_small_page_orientation"], [8, 1, 1, "", "mobilenet_v3_small_r"], [8, 1, 1, "", "page_orientation_predictor"], [8, 1, 1, "", "resnet18"], [8, 1, 1, "", "resnet31"], [8, 1, 1, "", "resnet34"], [8, 1, 1, "", "resnet50"], [8, 1, 1, "", "textnet_base"], [8, 1, 1, "", "textnet_small"], [8, 1, 1, "", "textnet_tiny"], [8, 1, 1, "", "vgg16_bn_r"], [8, 1, 1, "", "vit_b"], [8, 1, 1, "", "vit_s"]], "doctr.models.detection": [[8, 1, 1, "", "db_mobilenet_v3_large"], [8, 1, 1, "", "db_resnet50"], [8, 1, 1, "", "detection_predictor"], [8, 1, 1, "", "fast_base"], [8, 1, 1, "", "fast_small"], [8, 1, 1, "", "fast_tiny"], [8, 1, 1, "", "linknet_resnet18"], [8, 1, 1, "", "linknet_resnet34"], [8, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[8, 1, 1, "", "from_hub"], [8, 1, 1, "", "login_to_hub"], [8, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[8, 1, 1, "", "crnn_mobilenet_v3_large"], [8, 1, 1, "", "crnn_mobilenet_v3_small"], [8, 1, 1, "", "crnn_vgg16_bn"], [8, 1, 1, "", "master"], [8, 1, 1, "", "parseq"], [8, 1, 1, "", "recognition_predictor"], [8, 1, 1, "", "sar_resnet31"], [8, 1, 1, "", "vitstr_base"], [8, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[9, 0, 1, "", "ChannelShuffle"], [9, 0, 1, "", "ColorInversion"], [9, 0, 1, "", "Compose"], [9, 0, 1, "", "GaussianBlur"], [9, 0, 1, "", "GaussianNoise"], [9, 0, 1, "", "LambdaTransformation"], [9, 0, 1, "", "Normalize"], [9, 0, 1, "", "OneOf"], [9, 0, 1, "", "RandomApply"], [9, 0, 1, "", "RandomBrightness"], [9, 0, 1, "", "RandomContrast"], [9, 0, 1, "", "RandomCrop"], [9, 0, 1, "", "RandomGamma"], [9, 0, 1, "", "RandomHorizontalFlip"], [9, 0, 1, "", "RandomHue"], [9, 0, 1, "", "RandomJpegQuality"], [9, 0, 1, "", "RandomResize"], [9, 0, 1, "", "RandomRotate"], [9, 0, 1, "", "RandomSaturation"], [9, 0, 1, "", "RandomShadow"], [9, 0, 1, "", "Resize"], [9, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[10, 0, 1, "", "DetectionMetric"], [10, 0, 1, "", "LocalizationConfusion"], [10, 0, 1, "", "OCRMetric"], [10, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.visualization": [[10, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [1, 7, 8, 10, 14, 17], "0": [1, 3, 6, 9, 10, 12, 15, 16, 18], "00": 18, "01": 18, "0123456789": 6, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "02562": 8, "03": 18, "035": 18, "0361328125": 18, "04": 18, "05": 18, "06": 18, "06640625": 18, "07": 18, "08": [9, 18], "09": 18, "0966796875": 18, "1": [6, 7, 8, 9, 10, 12, 16, 18], "10": [6, 10, 18], "100": [6, 9, 10, 16, 18], "1000": 18, "101": 6, "1024": [8, 12, 18], "104": 6, "106": 6, "108": 6, "1095": 16, "11": 18, "110": 10, "1107": 16, "114": 6, "115": 6, "1156": 16, "116": 6, "118": 6, "11800h": 18, "11th": 18, "12": 18, "120": 6, "123": 6, "126": 6, "1268": 16, "128": [8, 12, 17, 18], "13": 18, "130": 6, "13068": 16, "131": 6, "1337891": 16, "1357421875": 18, "1396484375": 18, "14": 18, "1420": 18, "14470v1": 6, "149": 16, "15": 18, "150": [10, 18], "1552": 18, "16": [8, 17, 18], "1630859375": 18, "1684": 18, "16x16": 8, "17": 18, "1778": 18, "1782": 18, "18": [8, 18], "185546875": 18, "1900": 18, "1910": 8, "19342": 16, "19370": 16, "195": 6, "19598": 16, "199": 18, "1999": 18, "2": [3, 4, 6, 7, 9, 15, 18], "20": 18, "200": 10, "2000": 16, "2003": [4, 6], "2012": 6, "2013": [4, 6], "2015": 6, "2019": 4, "207901": 16, "21": 18, "2103": 6, "2186": 16, "21888": 16, "22": 18, "224": [8, 9], "225": 9, "22672": 16, "229": [9, 16], "23": 18, "233": 16, "236": 6, "24": 18, "246": 16, "249": 16, "25": 18, "2504": 18, "255": [7, 8, 9, 10, 18], "256": 8, "257": 16, "26": 18, "26032": 16, "264": 12, "27": 18, "2700": 16, "2710": 18, "2749": 12, "28": 18, "287": 12, "29": 18, "296": 12, "299": 12, "2d": 18, "3": [3, 4, 7, 8, 9, 10, 17, 18], "30": 18, "300": 16, "3000": 16, "301": 12, "30595": 18, "30ghz": 18, "31": 8, "32": [6, 8, 9, 12, 16, 17, 18], "3232421875": 18, "33": [9, 18], "33402": 16, "33608": 16, "34": [8, 18], "340": 18, "3456": 18, "3515625": 18, "36": 18, "360": 16, "37": [6, 18], "38": 18, "39": 18, "4": [8, 9, 10, 18], "40": 18, "406": 9, "41": 18, "42": 18, "43": 18, "44": 18, "45": 18, "456": 9, "46": 18, "47": 18, "472": 16, "48": [6, 18], "485": 9, "49": 18, "49377": 16, "5": [6, 9, 10, 15, 18], "50": [8, 16, 18], "51": 18, "51171875": 18, "512": 8, "52": [6, 18], "529": 18, "53": 18, "54": 18, "540": 18, "5478515625": 18, "55": 18, "56": 18, "57": 18, "58": [6, 18], "580": 18, "5810546875": 18, "583": 18, "59": 18, "597": 18, "5k": [4, 6], "5m": 18, "6": [9, 18], "60": 9, "600": [8, 10, 18], "61": 18, "62": 18, "626": 16, "63": 18, "64": [8, 9, 18], "641": 18, "647": 16, "65": 18, "66": 18, "67": 18, "68": 18, "69": 18, "693": 12, "694": 12, "695": 12, "6m": 18, "7": 18, "70": [6, 10, 18], "707470": 16, "71": [6, 18], "7100000": 16, "7141797": 16, "7149": 16, "72": 18, "72dpi": 7, "73": 18, "73257": 16, "74": 18, "75": [9, 18], "7581382": 16, "76": 18, "77": 18, "772": 12, "772875": 16, "78": 18, "785": 12, "79": 18, "793533": 16, "796": 16, "798": 12, "7m": 18, "8": [8, 9, 18], "80": 18, "800": [8, 10, 16, 18], "81": 18, "82": 18, "83": 18, "84": 18, "849": 16, "85": 18, "8564453125": 18, "857": 18, "85875": 16, "86": 18, "8603515625": 18, "87": 18, "8707": 16, "88": 18, "89": 18, "9": [3, 9, 18], "90": 18, "90k": 6, "90kdict32px": 6, "91": 18, "914085328578949": 18, "92": 18, "93": 18, "94": [6, 18], "95": [10, 18], "9578408598899841": 18, "96": 18, "97": 18, "98": 18, "99": 18, "9949972033500671": 18, "A": [1, 2, 4, 6, 7, 8, 11, 17], "As": 2, "Be": 18, "Being": 1, "By": 13, "For": [1, 2, 3, 12, 18], "If": [2, 7, 8, 12, 18], "In": [2, 6, 16], "It": [9, 14, 15, 17], "Its": [4, 8], "No": [1, 18], "Of": 6, "Or": [15, 17], "The": [1, 2, 6, 7, 10, 13, 15, 16, 17, 18], "Then": 8, "To": [2, 3, 13, 14, 15, 17, 18], "_": [1, 6, 8], "__call__": 18, "_build": 2, "_i": 10, "ab": 6, "abc": 17, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "abdef": [6, 16], "abl": [16, 18], "about": [1, 16, 18], "abov": 18, "abstractdataset": 6, "abus": 1, "accept": 1, "access": [4, 7, 16, 18], "account": [1, 14], "accur": 18, "accuraci": 10, "achiev": 17, "act": 1, "action": 1, "activ": 4, "ad": [2, 8, 9], "adapt": 1, "add": [9, 10, 14, 18], "add_hook": 18, "add_label": 10, "addit": [2, 3, 7, 15, 18], "addition": [2, 18], "address": [1, 7], "adjust": 9, "advanc": 1, "advantag": 17, "advis": 2, "aesthet": [4, 6], "affect": 1, "after": [14, 18], "ag": 1, "again": 8, "aggreg": [10, 16], "aggress": 1, "align": [1, 7, 9], "all": [1, 2, 5, 6, 7, 9, 10, 15, 16, 18], "allow": [1, 17], "along": 18, "alreadi": [2, 17], "also": [1, 8, 14, 15, 16, 18], "alwai": 16, "an": [1, 2, 4, 6, 7, 8, 10, 15, 17, 18], "analysi": [7, 15], "ancient_greek": 6, "angl": [7, 9], "ani": [1, 6, 7, 8, 9, 10, 17, 18], "annot": 6, "anot": 16, "anoth": [8, 12, 16], "answer": 1, "anyascii": 10, "anyon": 4, "anyth": 15, "api": [2, 4], "apolog": 1, "apologi": 1, "app": 2, "appear": 1, "appli": [1, 6, 9], "applic": [4, 8], "appoint": 1, "appreci": 14, "appropri": [1, 2, 18], "ar": [1, 2, 3, 5, 6, 7, 9, 10, 11, 15, 16, 18], "arab": 6, "arabic_diacrit": 6, "arabic_lett": 6, "arabic_punctu": 6, "arbitrarili": [4, 8], "arch": [8, 14], "architectur": [4, 8, 14, 15], "area": 18, "argument": [6, 7, 8, 10, 12, 18], "around": 1, "arrai": [7, 9, 10], "art": [4, 15], "artefact": [10, 15, 18], "artefact_typ": 7, "artifici": [4, 6], "arxiv": [6, 8], "asarrai": 10, "ascii_lett": 6, "aspect": [4, 8, 9, 18], "assess": 10, "assign": 10, "associ": 7, "assum": 8, "assume_straight_pag": [8, 12, 18], "astyp": [8, 10, 18], "attack": 1, "attend": [4, 8], "attent": [1, 8], "autom": 4, "automat": 18, "autoregress": [4, 8], "avail": [1, 4, 5, 9], "averag": [9, 18], "avoid": [1, 3], "aw": [4, 18], "awar": 18, "azur": 18, "b": [8, 10, 18], "b_j": 10, "back": 2, "backbon": 8, "backend": 18, "background": 16, "bangla": 6, "bar": 15, "bar_cod": 16, "base": [4, 8, 15], "baselin": [4, 8, 18], "batch": [6, 8, 9, 15, 16, 18], "batch_siz": [6, 12, 15, 16, 17], "bblanchon": 3, "bbox": 18, "becaus": 13, "been": [2, 10, 16, 18], "befor": [6, 8, 9, 18], "begin": 10, "behavior": [1, 18], "being": [10, 18], "belong": 18, "benchmark": 18, "best": 1, "better": [11, 18], "between": [9, 10, 18], "bgr": 7, "bilinear": 9, "bin_thresh": 18, "binar": [4, 8, 18], "binari": [7, 17, 18], "bit": 17, "block": [10, 18], "block_1_1": 18, "blur": 9, "bmvc": 6, "bn": 14, "bodi": [1, 18], "bool": [6, 7, 8, 9, 10], "boolean": [8, 18], "both": [4, 6, 9, 16, 18], "bottom": [8, 18], "bound": [6, 7, 8, 9, 10, 15, 16, 18], "box": [6, 7, 8, 9, 10, 15, 16, 18], "box_thresh": 18, "bright": 9, "browser": [2, 4], "build": [2, 3, 17], "built": 2, "byte": [7, 18], "c": [3, 7, 10], "c_j": 10, "cach": [2, 6, 13], "cache_sampl": 6, "call": 17, "callabl": [6, 9], "can": [2, 3, 12, 13, 14, 15, 16, 18], "capabl": [2, 11, 18], "case": [6, 10], "cf": 18, "cfg": 18, "challeng": 6, "challenge2_test_task12_imag": 6, "challenge2_test_task1_gt": 6, "challenge2_training_task12_imag": 6, "challenge2_training_task1_gt": 6, "chang": [13, 18], "channel": [1, 2, 7, 9], "channel_prior": 3, "channelshuffl": 9, "charact": [4, 6, 7, 10, 16, 18], "charactergener": [6, 16], "characterist": 1, "charg": 18, "charset": 18, "chart": 7, "check": [2, 14, 18], "checkpoint": 8, "chip": 3, "ci": 2, "clarifi": 1, "clariti": 1, "class": [1, 6, 7, 9, 10, 18], "class_nam": 12, "classif": [16, 18], "classmethod": 7, "clear": 2, "clone": 3, "close": 2, "co": 14, "code": [4, 7, 15], "codecov": 2, "colab": 11, "collate_fn": 6, "collect": [7, 15], "color": 9, "colorinvers": 9, "column": 7, "com": [1, 3, 7, 8, 14], "combin": 18, "command": [2, 15], "comment": 1, "commit": 1, "common": [1, 9, 10, 17], "commun": 1, "compar": 4, "comparison": [10, 18], "competit": 6, "compil": [11, 18], "complaint": 1, "complementari": 10, "complet": 2, "compon": 18, "compos": [6, 18], "comprehens": 18, "comput": [6, 10, 17, 18], "conf_threshold": 15, "confid": [7, 18], "config": [3, 8], "configur": 8, "confus": 10, "consecut": [9, 18], "consequ": 1, "consid": [1, 2, 6, 7, 10, 18], "consist": 18, "consolid": [4, 6], "constant": 9, "construct": 1, "contact": 1, "contain": [5, 6, 11, 16, 18], "content": [6, 7, 18], "context": 8, "contib": 3, "continu": 1, "contrast": 9, "contrast_factor": 9, "contrib": [3, 15], "contribut": 1, "contributor": 2, "convers": 7, "convert": [7, 9], "convolut": 8, "coordin": [7, 18], "cord": [4, 6, 16, 18], "core": [10, 18], "corner": 18, "correct": 9, "correspond": [3, 7, 9, 18], "could": [1, 15], "counterpart": 10, "cover": 2, "coverag": 2, "cpu": [4, 12, 17], "creat": 14, "crnn": [4, 8, 14], "crnn_mobilenet_v3_larg": [8, 14, 18], "crnn_mobilenet_v3_smal": [8, 17, 18], "crnn_vgg16_bn": [8, 12, 14, 18], "crop": [7, 8, 9, 12, 16, 18], "crop_orient": [7, 18], "crop_orientation_predictor": [8, 12], "crop_param": 12, "cuda": 17, "currenc": 6, "current": [2, 12, 18], "custom": [14, 15, 17, 18], "custom_crop_orientation_model": 12, "custom_page_orientation_model": 12, "customhook": 18, "cvit": 4, "czczup": 8, "czech": 6, "d": [6, 16], "danish": 6, "data": [4, 6, 7, 9, 10, 12, 14], "dataload": 16, "dataset": [8, 12, 18], "dataset_info": 6, "date": [12, 18], "db": 14, "db_mobilenet_v3_larg": [8, 14, 18], "db_resnet34": 18, "db_resnet50": [8, 12, 14, 18], "dbnet": [4, 8], "deal": [11, 18], "decis": 1, "decod": 7, "decode_img_as_tensor": 7, "dedic": 17, "deem": 1, "deep": [8, 18], "def": 18, "default": [3, 7, 12, 13, 18], "defer": 16, "defin": [10, 17], "degre": [7, 9, 18], "degress": 7, "delet": 2, "delimit": 18, "delta": 9, "demo": [2, 4], "demonstr": 1, "depend": [2, 3, 4, 18], "deploi": 2, "deploy": 4, "derogatori": 1, "describ": 8, "descript": 11, "design": 9, "desir": 7, "det_arch": [8, 12, 14, 17], "det_b": 18, "det_model": [12, 14, 17], "det_param": 12, "det_predictor": [12, 18], "detail": [12, 18], "detect": [6, 7, 10, 11, 12, 15], "detect_languag": 8, "detect_orient": [8, 12, 18], "detection_predictor": [8, 18], "detection_task": [6, 16], "detectiondataset": [6, 16], "detectionmetr": 10, "detectionpredictor": [8, 12], "detector": [4, 8, 15], "deterior": 8, "determin": 1, "dev": [2, 13], "develop": 3, "deviat": 9, "devic": 17, "dict": [7, 10, 18], "dictionari": [7, 10], "differ": 1, "differenti": [4, 8], "digit": [4, 6, 16], "dimens": [7, 10, 18], "dimension": 9, "direct": 6, "directli": [14, 18], "directori": [2, 13], "disabl": [1, 13, 18], "disable_crop_orient": 18, "disable_page_orient": 18, "disclaim": 18, "discuss": 2, "disparag": 1, "displai": [7, 10], "display_artefact": 10, "distribut": 9, "div": 18, "divers": 1, "divid": 7, "do": [2, 3, 8], "doc": [2, 7, 15, 17, 18], "docartefact": [6, 16], "docstr": 2, "doctr": [3, 12, 13, 14, 15, 16, 17, 18], "doctr_cache_dir": 13, "doctr_multiprocessing_dis": 13, "document": [6, 8, 10, 11, 12, 15, 16, 17, 18], "documentbuild": 18, "documentfil": [7, 12, 14, 15, 17], "doesn": 17, "don": [12, 18], "done": 9, "download": [6, 16], "downsiz": 8, "draw": 9, "drop": 6, "drop_last": 6, "dtype": [7, 8, 9, 10, 17], "dual": [4, 6], "dummi": 14, "dummy_img": 18, "dummy_input": 17, "dure": 1, "dutch": 6, "dynam": [6, 15], "dynamic_seq_length": 6, "e": [1, 2, 3, 7, 8], "each": [4, 6, 7, 8, 9, 10, 16, 18], "eas": 2, "easi": [4, 10, 14, 17], "easili": [7, 10, 12, 14, 16, 18], "econom": 1, "edit": 1, "educ": 1, "effect": 18, "effici": [2, 4, 6, 8], "either": [10, 18], "element": [6, 7, 8, 18], "els": [2, 15], "email": 1, "empathi": 1, "en": 18, "enabl": [6, 7], "enclos": 7, "encod": [4, 6, 7, 8, 18], "encode_sequ": 6, "encount": 2, "encrypt": 7, "end": [4, 6, 8, 10], "english": [6, 16], "enough": [2, 18], "ensur": 2, "entri": 6, "environ": [1, 13], "eo": 6, "equiv": 18, "estim": 8, "etc": [7, 15], "ethnic": 1, "evalu": [16, 18], "event": 1, "everyon": 1, "everyth": [2, 18], "exact": [10, 18], "exampl": [1, 2, 4, 6, 8, 14, 18], "exchang": 17, "execut": 18, "exist": 14, "expand": 9, "expect": [7, 9, 10], "experi": 1, "explan": [1, 18], "explicit": 1, "exploit": [4, 8], "export": [7, 8, 10, 11, 15, 18], "export_as_straight_box": [8, 18], "export_as_xml": 18, "export_model_to_onnx": 17, "express": [1, 9], "extens": 7, "extern": [1, 16], "extract": [4, 6], "extractor": 8, "f_": 10, "f_a": 10, "factor": 9, "fair": 1, "fairli": 1, "fals": [6, 7, 8, 9, 10, 12, 18], "faq": 1, "fascan": 14, "fast": [4, 6, 8], "fast_bas": [8, 18], "fast_smal": [8, 18], "fast_tini": [8, 18], "faster": [4, 8, 17], "fasterrcnn_mobilenet_v3_large_fpn": 8, "favorit": 18, "featur": [3, 8, 10, 11, 12, 15], "feedback": 1, "feel": [2, 14], "felix92": 14, "few": [17, 18], "figsiz": 10, "figur": [10, 15], "file": [2, 6], "final": 8, "find": [2, 16], "finnish": 6, "first": [2, 6], "firsthand": 6, "fit": [8, 18], "flag": 18, "flip": 9, "float": [7, 9, 10, 17], "float32": [7, 8, 9, 17], "fn": 9, "focu": 14, "focus": [1, 6], "folder": 6, "follow": [1, 2, 3, 6, 9, 10, 12, 13, 14, 15, 18], "font": 6, "font_famili": 6, "foral": 10, "forc": 2, "forg": 3, "form": [4, 6, 18], "format": [7, 10, 12, 16, 17, 18], "forpost": [4, 6], "forum": 2, "fp16": 17, "frac": 10, "framework": [3, 14, 16, 18], "free": [1, 2, 14], "french": [6, 12, 14, 18], "friendli": 4, "from": [1, 4, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18], "from_hub": [8, 14], "from_imag": [7, 14, 15, 17], "from_pdf": 7, "from_url": 7, "full": [6, 10, 18], "function": [6, 9, 10, 15], "funsd": [4, 6, 16, 18], "further": 16, "futur": 6, "g": [7, 8], "g_": 10, "g_x": 10, "gamma": 9, "gaussian": 9, "gaussianblur": 9, "gaussiannois": 9, "gen": 18, "gender": 1, "gener": [2, 4, 7, 8], "generic_cyrillic_lett": 6, "geometri": [4, 7, 18], "geq": 10, "german": [6, 12, 14], "get": [17, 18], "git": 14, "github": [2, 3, 8, 14], "give": [1, 15], "given": [6, 7, 9, 10, 18], "global": 8, "go": 18, "good": 17, "googl": 2, "googlevis": 4, "gpu": [4, 15, 17], "gracefulli": 1, "graph": [4, 6, 7], "grayscal": 9, "ground": 10, "groung": 10, "group": [4, 18], "gt": 10, "gt_box": 10, "gt_label": 10, "guid": 2, "guidanc": 16, "gvision": 18, "h": [7, 8, 9], "h_": 10, "ha": [2, 6, 10, 16], "handl": [11, 16, 18], "handwrit": 6, "handwritten": 16, "harass": 1, "hardwar": 18, "harm": 1, "hat": 10, "have": [1, 2, 10, 12, 14, 16, 17, 18], "head": [8, 18], "healthi": 1, "hebrew": 6, "height": [7, 9], "hello": [10, 18], "help": 17, "here": [5, 9, 11, 15, 16, 18], "hf": 8, "hf_hub_download": 8, "high": 7, "higher": [3, 6, 18], "hindi": 6, "hindi_digit": 6, "hocr": 18, "hook": 18, "horizont": [7, 9, 18], "hous": 6, "how": [2, 11, 12, 14, 16], "howev": 16, "hsv": 9, "html": [1, 2, 3, 7, 18], "http": [1, 3, 6, 7, 8, 14, 18], "hub": 8, "hue": 9, "huggingfac": 8, "hw": 6, "i": [1, 2, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17], "i7": 18, "ic03": [4, 6, 16], "ic13": [4, 6, 16], "icdar": [4, 6], "icdar2019": 6, "id": 18, "ident": 1, "identifi": 4, "iiit": [4, 6], "iiit5k": [6, 16], "iiithw": [4, 6, 16], "imag": [4, 6, 7, 8, 9, 10, 14, 15, 16, 18], "imagenet": 8, "imageri": 1, "images_90k_norm": 6, "img": [6, 9, 16, 17], "img_cont": 7, "img_fold": [6, 16], "img_path": 7, "img_transform": 6, "imgur5k": [4, 6, 16], "imgur5k_annot": 6, "imlist": 6, "impact": 1, "implement": [6, 7, 8, 9, 10, 18], "import": [6, 7, 8, 9, 10, 12, 14, 15, 16, 17, 18], "improv": 8, "inappropri": 1, "incid": 1, "includ": [1, 6, 16, 17], "inclus": 1, "increas": 9, "independ": 9, "index": [2, 7], "indic": 10, "individu": 1, "infer": [4, 8, 9, 15, 18], "inform": [1, 2, 4, 6, 16], "input": [2, 7, 8, 9, 17, 18], "input_crop": 8, "input_pag": [8, 10, 18], "input_shap": 17, "input_tensor": 8, "inspir": [1, 9], "instal": [14, 15, 17], "instanc": [1, 18], "instanti": [8, 18], "instead": [6, 7, 8], "insult": 1, "int": [6, 7, 9], "int64": 10, "integ": 10, "integr": [4, 14, 16], "intel": 18, "interact": [1, 7, 10], "interfac": [14, 17], "interoper": 17, "interpol": 9, "interpret": [6, 7], "intersect": 10, "invert": 9, "investig": 1, "invis": 1, "involv": [1, 18], "io": [12, 14, 15, 17], "iou": 10, "iou_thresh": 10, "iou_threshold": 15, "irregular": [4, 8, 16], "isn": 6, "issu": [1, 2, 14], "italian": 6, "iter": [6, 9, 16, 18], "its": [7, 8, 9, 10, 16, 18], "itself": [8, 14], "j": 10, "job": 2, "join": 2, "jpeg": 9, "jpegqual": 9, "jpg": [6, 7, 14, 17], "json": [6, 16, 18], "json_output": 18, "jump": 2, "just": 1, "kei": [4, 6], "kera": [8, 17], "kernel": [4, 8, 9], "kernel_shap": 9, "keywoard": 8, "keyword": [6, 7, 8, 10], "kie": [8, 12], "kie_predictor": [8, 12], "kiepredictor": 8, "kind": 1, "know": [2, 17], "kwarg": [6, 7, 8, 10], "l": 10, "l_j": 10, "label": [6, 10, 15, 16], "label_fil": [6, 16], "label_fold": 6, "label_path": [6, 16], "labels_path": [6, 16], "ladder": 1, "lambda": 9, "lambdatransform": 9, "lang": 18, "languag": [1, 4, 6, 7, 8, 14, 18], "larg": [8, 14], "largest": 10, "last": [3, 6], "latenc": 8, "later": 2, "latest": 18, "latin": 6, "layer": 17, "layout": 18, "lead": 1, "leader": 1, "learn": [1, 4, 8, 17, 18], "least": 3, "left": [10, 18], "legacy_french": 6, "length": [6, 18], "less": [17, 18], "level": [1, 6, 10, 18], "leverag": 11, "lf": 14, "librari": [2, 3, 11, 12], "light": 4, "lightweight": 17, "like": 1, "limits_": 10, "line": [4, 8, 10, 18], "line_1_1": 18, "link": 12, "linknet": [4, 8], "linknet_resnet18": [8, 12, 17, 18], "linknet_resnet34": [8, 17, 18], "linknet_resnet50": [8, 18], "list": [6, 7, 9, 10, 14], "ll": 10, "load": [4, 6, 8, 15, 17], "load_state_dict": 12, "load_weight": 12, "loc_pr": 18, "local": [2, 4, 6, 8, 10, 16, 18], "localis": 6, "localizationconfus": 10, "locat": [2, 7, 18], "login": 8, "login_to_hub": [8, 14], "logo": [7, 15, 16], "love": 14, "lower": [9, 10, 18], "m": [2, 10, 18], "m1": 3, "macbook": 3, "machin": 17, "made": 4, "magc_resnet31": 8, "mai": [1, 2], "mail": 1, "main": 11, "maintain": 4, "mainten": 2, "make": [1, 2, 10, 12, 13, 14, 17, 18], "mani": [16, 18], "manipul": 18, "map": [6, 8], "map_loc": 12, "master": [4, 8, 18], "match": [10, 18], "mathcal": 10, "matplotlib": [7, 10], "max": [6, 9, 10], "max_angl": 9, "max_area": 9, "max_char": [6, 16], "max_delta": 9, "max_gain": 9, "max_gamma": 9, "max_qual": 9, "max_ratio": 9, "maximum": [6, 9], "maxval": [8, 9], "mbox": 10, "mean": [9, 10, 12], "meaniou": 10, "meant": [7, 17], "measur": 18, "media": 1, "median": 8, "meet": 12, "member": 1, "memori": [13, 17], "mention": 18, "merg": 6, "messag": 2, "meta": 18, "metadata": 17, "metal": 3, "method": [7, 9, 18], "metric": [10, 18], "middl": 18, "might": [17, 18], "min": 9, "min_area": 9, "min_char": [6, 16], "min_gain": 9, "min_gamma": 9, "min_qual": 9, "min_ratio": 9, "min_val": 9, "minde": [1, 3, 4, 8], "minim": [2, 4], "minimalist": [4, 8], "minimum": [3, 6, 9, 10, 18], "minval": 9, "miss": 3, "mistak": 1, "mixed_float16": 17, "mixed_precis": 17, "mjsynth": [4, 6, 16], "mnt": 6, "mobilenet": [8, 14], "mobilenet_v3_larg": 8, "mobilenet_v3_large_r": 8, "mobilenet_v3_smal": [8, 12], "mobilenet_v3_small_crop_orient": [8, 12], "mobilenet_v3_small_page_orient": [8, 12], "mobilenet_v3_small_r": 8, "mobilenetv3": 8, "modal": [4, 6], "mode": 3, "model": [6, 10, 13, 15, 16], "model_nam": [8, 14, 17], "model_path": [15, 17], "moder": 1, "modif": 2, "modifi": [8, 13, 18], "modul": [3, 7, 8, 9, 10, 18], "more": [2, 16, 18], "most": 18, "mozilla": 1, "multi": [4, 8], "multilingu": [6, 14], "multipl": [6, 7, 9, 18], "multipli": 9, "multiprocess": 13, "my": 8, "my_awesome_model": 14, "my_hook": 18, "n": [6, 10], "name": [6, 8, 17, 18], "nation": 1, "natur": [1, 4, 6], "ndarrai": [6, 7, 9, 10], "necessari": [3, 12, 13], "need": [2, 3, 6, 10, 12, 13, 14, 15, 18], "neg": 9, "nest": 18, "network": [4, 6, 8, 17], "neural": [4, 6, 8, 17], "new": [2, 10], "next": [6, 16], "nois": 9, "noisi": [4, 6], "non": [4, 6, 7, 8, 9, 10], "none": [6, 7, 8, 9, 10, 18], "normal": [8, 9], "norwegian": 6, "note": [0, 2, 6, 8, 12, 14, 15, 17], "now": 2, "np": [8, 9, 10, 18], "num_output_channel": 9, "num_sampl": [6, 16], "number": [6, 9, 10, 18], "numpi": [7, 8, 10, 18], "o": 3, "obb": 15, "obj_detect": 14, "object": [6, 7, 10, 15, 18], "objectness_scor": [7, 18], "oblig": 1, "obtain": 18, "occupi": 17, "ocr": [4, 6, 8, 10, 14], "ocr_carea": 18, "ocr_db_crnn": 10, "ocr_lin": 18, "ocr_pag": 18, "ocr_par": 18, "ocr_predictor": [8, 12, 14, 17, 18], "ocrdataset": [6, 16], "ocrmetr": 10, "ocrpredictor": [8, 12], "ocrx_word": 18, "offens": 1, "offici": [1, 8], "offlin": 1, "offset": 9, "onc": 18, "one": [2, 6, 8, 9, 12, 14, 18], "oneof": 9, "ones": [6, 10], "onli": [2, 8, 9, 10, 12, 14, 16, 17, 18], "onlin": 1, "onnx": 15, "onnxruntim": [15, 17], "onnxtr": 17, "opac": 9, "opacity_rang": 9, "open": [1, 2, 14, 17], "opinion": 1, "optic": [4, 18], "optim": [4, 18], "option": [6, 8, 12], "order": [2, 6, 7, 9], "org": [1, 6, 8, 18], "organ": 7, "orient": [1, 7, 8, 11, 15, 18], "orientationpredictor": 8, "other": [1, 2], "otherwis": [1, 7, 10], "our": [2, 8, 18], "out": [2, 8, 9, 10, 18], "outpout": 18, "output": [7, 9, 17], "output_s": [7, 9], "outsid": 13, "over": [6, 10, 18], "overal": [1, 8], "overlai": 7, "overview": 15, "overwrit": 12, "overwritten": 14, "own": 4, "p": [9, 18], "packag": [2, 4, 10, 13, 15, 16, 17], "pad": [6, 8, 9, 18], "page": [3, 6, 8, 10, 12, 18], "page1": 7, "page2": 7, "page_1": 18, "page_idx": [7, 18], "page_orientation_predictor": [8, 12], "page_param": 12, "pair": 10, "paper": 8, "par_1_1": 18, "paragraph": 18, "paragraph_break": 18, "param": [9, 18], "paramet": [4, 7, 8, 17], "pars": [4, 6], "parseq": [4, 8, 14, 17, 18], "part": [6, 9, 18], "parti": 3, "partial": 18, "particip": 1, "pass": [6, 7, 8, 12, 18], "password": 7, "patch": [8, 10], "path": [6, 7, 15, 16, 17], "path_to_checkpoint": 12, "path_to_custom_model": 17, "path_to_pt": 12, "pattern": 1, "pdf": [7, 8, 11], "pdfpage": 7, "peopl": 1, "per": [9, 18], "perform": [4, 7, 8, 9, 10, 13, 17, 18], "period": 1, "permiss": 1, "permut": [4, 8], "persian_lett": 6, "person": [1, 16], "phase": 18, "photo": 16, "physic": [1, 7], "pick": 9, "pictur": 7, "pip": [2, 3, 15, 17], "pipelin": 18, "pixel": [7, 9, 18], "pleas": 2, "plot": 10, "plt": 10, "plug": 14, "plugin": 3, "png": 7, "point": 17, "polici": 13, "polish": 6, "polit": 1, "polygon": [6, 10, 18], "pool": 8, "portugues": 6, "posit": [1, 10], "possibl": [2, 10, 14, 18], "post": [1, 18], "postprocessor": 18, "potenti": 8, "power": 4, "ppageno": 18, "pre": [2, 8, 17], "precis": [10, 18], "pred": 10, "pred_box": 10, "pred_label": 10, "predefin": 16, "predict": [7, 8, 10, 18], "predictor": [4, 7, 8, 11, 12, 14, 17], "prefer": 16, "preinstal": 3, "preprocessor": [12, 18], "prerequisit": 14, "present": 11, "preserv": [8, 9, 18], "preserve_aspect_ratio": [7, 8, 9, 12, 18], "pretrain": [4, 8, 10, 12, 17, 18], "pretrained_backbon": [8, 12], "print": 18, "prior": 6, "privaci": 1, "privat": 1, "probabl": 9, "problem": 2, "procedur": 9, "process": [2, 4, 7, 12, 18], "processor": 18, "produc": [11, 18], "product": 17, "profession": 1, "project": [2, 16], "promptli": 1, "proper": 2, "properli": 6, "provid": [1, 2, 4, 14, 15, 16, 18], "public": [1, 4], "publicli": 18, "publish": 1, "pull": 14, "punctuat": 6, "pure": 6, "purpos": 2, "push_to_hf_hub": [8, 14], "py": 14, "pypdfium2": [3, 7], "pyplot": [7, 10], "python": [2, 15], "python3": 14, "pytorch": [3, 4, 8, 9, 12, 14, 17, 18], "q": 2, "qr": [7, 15], "qr_code": 16, "qualiti": 9, "question": 1, "quickli": 4, "quicktour": 11, "r": 18, "race": 1, "ramdisk": 6, "rand": [8, 9, 10, 17, 18], "random": [8, 9, 10, 18], "randomappli": 9, "randombright": 9, "randomcontrast": 9, "randomcrop": 9, "randomgamma": 9, "randomhorizontalflip": 9, "randomhu": 9, "randomjpegqu": 9, "randomli": 9, "randomres": 9, "randomrot": 9, "randomsatur": 9, "randomshadow": 9, "rang": 9, "rassi": 14, "ratio": [8, 9, 18], "raw": [7, 10], "re": 17, "read": [4, 6, 8], "read_html": 7, "read_img_as_numpi": 7, "read_img_as_tensor": 7, "read_pdf": 7, "readi": 17, "real": [4, 8, 9], "reason": [1, 4, 6], "rebuild": 2, "rebuilt": 2, "recal": [10, 18], "receipt": [4, 6, 18], "reco_arch": [8, 12, 14, 17], "reco_b": 18, "reco_model": [12, 14, 17], "reco_param": 12, "reco_predictor": 12, "recogn": 18, "recognit": [6, 10, 11, 12], "recognition_predictor": [8, 18], "recognition_task": [6, 16], "recognitiondataset": [6, 16], "recognitionpredictor": [8, 12], "rectangular": 8, "reduc": [3, 9], "refer": [2, 3, 12, 14, 15, 16, 18], "regardless": 1, "region": 18, "regroup": 10, "regular": 16, "reject": 1, "rel": [7, 9, 10, 18], "relat": 7, "releas": [0, 3], "relev": 15, "religion": 1, "remov": 1, "render": [7, 18], "repo": 8, "repo_id": [8, 14], "report": 1, "repositori": [6, 8, 14], "repres": [1, 17, 18], "represent": [4, 8], "request": [1, 14], "requir": [3, 9, 17], "research": 4, "residu": 8, "resiz": [9, 18], "resnet": 8, "resnet18": [8, 14], "resnet31": 8, "resnet34": 8, "resnet50": [8, 14], "resolv": 7, "resolve_block": 18, "resolve_lin": 18, "resourc": 16, "respect": 1, "rest": [2, 9, 10], "restrict": 13, "result": [2, 6, 7, 11, 14, 17, 18], "return": 18, "reusabl": 18, "review": 1, "rgb": [7, 9], "rgb_mode": 7, "rgb_output": 7, "right": [1, 8, 10], "robust": [4, 6], "root": 6, "rotat": [6, 7, 8, 9, 10, 11, 12, 16, 18], "run": [2, 3, 8], "same": [2, 7, 10, 16, 17, 18], "sampl": [6, 16, 18], "sample_transform": 6, "sar": [4, 8], "sar_resnet31": [8, 18], "satur": 9, "save": [8, 16], "scale": [7, 8, 9, 10], "scale_rang": 9, "scan": [4, 6], "scene": [4, 6, 8], "score": [7, 10], "script": [2, 16], "seamless": 4, "seamlessli": [4, 18], "search": 8, "searchabl": 11, "sec": 18, "second": 18, "section": [12, 14, 15, 17, 18], "secur": [1, 13], "see": [1, 2], "seen": 18, "segment": [4, 8, 18], "self": 18, "semant": [4, 8], "send": 18, "sens": 10, "sensit": 16, "separ": 18, "sequenc": [4, 6, 7, 8, 10, 18], "sequenti": [9, 18], "seri": 1, "seriou": 1, "set": [1, 3, 6, 8, 10, 13, 15, 18], "set_global_polici": 17, "sever": [7, 9, 18], "sex": 1, "sexual": 1, "shade": 9, "shape": [4, 7, 8, 9, 10, 18], "share": [13, 16], "shift": 9, "shm": 13, "should": [2, 6, 7, 9, 10], "show": [4, 7, 8, 10, 12, 14, 15], "showcas": [2, 11], "shuffl": [6, 9], "side": 10, "signatur": 7, "signific": 16, "simpl": [4, 8, 17], "simpler": 8, "sinc": [6, 16], "singl": [1, 2, 4, 6], "single_img_doc": 17, "size": [1, 6, 7, 9, 15, 18], "skew": 18, "slack": 2, "slightli": 8, "small": [2, 8, 18], "smallest": 7, "snapshot_download": 8, "snippet": 18, "so": [2, 3, 6, 8, 14, 16], "social": 1, "socio": 1, "some": [3, 11, 14, 16], "someth": 2, "somewher": 2, "sort": 1, "sourc": [6, 7, 8, 9, 10, 14], "space": [1, 18], "span": 18, "spanish": 6, "spatial": [4, 6, 7], "specif": [2, 3, 10, 12, 16, 18], "specifi": [1, 6, 7], "speed": [4, 8, 18], "sphinx": 2, "sroie": [4, 6, 16], "stabl": 3, "stackoverflow": 2, "stage": 4, "standalon": 11, "standard": 9, "start": 6, "state": [4, 10, 15], "static": 10, "statu": 1, "std": [9, 12], "step": 13, "still": 18, "str": [6, 7, 8, 9, 10], "straight": [6, 8, 16, 18], "straighten": 18, "straighten_pag": [8, 12, 18], "straigten_pag": 12, "stream": 7, "street": [4, 6], "strict": 3, "strictli": 10, "string": [6, 7, 10, 18], "strive": 3, "strong": [4, 8], "structur": [17, 18], "subset": [6, 18], "suggest": [2, 14], "sum": 10, "summari": 10, "support": [3, 12, 15, 17, 18], "sustain": 1, "svhn": [4, 6, 16], "svt": [6, 16], "swedish": 6, "symmetr": [8, 9, 18], "symmetric_pad": [8, 9, 18], "synthet": 4, "synthtext": [4, 6, 16], "system": 18, "t": [2, 6, 12, 17, 18], "tabl": [14, 15, 16], "take": [1, 6, 18], "target": [6, 7, 9, 10, 16], "target_s": 6, "task": [4, 6, 8, 14, 16, 18], "task2": 6, "team": 3, "techminde": 3, "templat": [2, 4], "tensor": [6, 7, 9, 18], "tensorflow": [3, 4, 7, 8, 9, 12, 14, 17, 18], "tensorspec": 17, "term": 1, "test": [6, 16], "test_set": 6, "text": [6, 7, 8, 10, 16], "text_output": 18, "textmatch": 10, "textnet": 8, "textnet_bas": 8, "textnet_smal": 8, "textnet_tini": 8, "textract": [4, 18], "textstylebrush": [4, 6], "textual": [4, 6, 7, 8, 18], "tf": [3, 7, 8, 9, 14, 17], "than": [2, 10, 14], "thank": 2, "thei": [1, 10], "them": [6, 18], "thi": [1, 2, 3, 5, 6, 9, 10, 12, 13, 14, 16, 17, 18], "thing": [17, 18], "third": 3, "those": [1, 7, 18], "threaten": 1, "threshold": 18, "through": [1, 9, 15, 16], "tilman": 14, "time": [1, 4, 8, 10, 16], "tini": 8, "titl": [7, 18], "tm": 18, "tmp": 13, "togeth": [2, 7], "tograi": 9, "tool": 16, "top": [10, 17, 18], "topic": 2, "torch": [3, 9, 12, 14, 17], "torchvis": 9, "total": 12, "toward": [1, 3], "train": [2, 6, 8, 9, 14, 15, 16, 17, 18], "train_it": [6, 16], "train_load": [6, 16], "train_pytorch": 14, "train_set": [6, 16], "train_tensorflow": 14, "trainabl": [4, 8], "tranform": 9, "transcrib": 18, "transfer": [4, 6], "transfo": 9, "transform": [4, 6, 8], "translat": 1, "troll": 1, "true": [6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18], "truth": 10, "tune": 17, "tupl": [6, 7, 9, 10], "two": [7, 13], "txt": 6, "type": [7, 10, 14, 17, 18], "typic": 18, "u": [1, 2], "ucsd": 6, "udac": 2, "uint8": [7, 8, 10, 18], "ukrainian": 6, "unaccept": 1, "underli": [16, 18], "underneath": 7, "understand": [4, 6, 18], "uniform": [8, 9], "uniformli": 9, "uninterrupt": [7, 18], "union": 10, "unittest": 2, "unlock": 7, "unoffici": 8, "unprofession": 1, "unsolicit": 1, "unsupervis": 4, "unwelcom": 1, "up": [8, 18], "updat": 10, "upgrad": 2, "upper": [6, 9], "uppercas": 16, "url": 7, "us": [1, 2, 3, 6, 8, 10, 11, 12, 13, 14, 15, 18], "usabl": 18, "usag": [13, 17], "use_polygon": [6, 10, 16], "useabl": 18, "user": [4, 7, 11], "utf": 18, "util": 17, "v1": 14, "v3": [8, 14, 18], "valid": 16, "valu": [2, 7, 9, 18], "valuabl": 4, "variabl": 13, "varieti": 6, "veri": 8, "version": [1, 2, 3, 17, 18], "vgg": 8, "vgg16": 14, "vgg16_bn_r": 8, "via": 1, "vietnames": 6, "view": [4, 6], "viewpoint": 1, "violat": 1, "visibl": 1, "vision": [4, 6, 8], "visiondataset": 6, "visiontransform": 8, "visual": [3, 4, 15], "visualize_pag": 10, "vit_": 8, "vit_b": 8, "vitstr": [4, 8, 17], "vitstr_bas": [8, 18], "vitstr_smal": [8, 12, 17, 18], "viz": 3, "vocab": [12, 14, 16, 17, 18], "vocabulari": [6, 12, 14], "w": [7, 8, 9, 10], "w3": 18, "wa": 1, "wai": [1, 4, 16], "want": [2, 17, 18], "warmup": 18, "wasn": 2, "we": [1, 2, 3, 4, 7, 9, 12, 14, 16, 17, 18], "weasyprint": 7, "web": [2, 7], "websit": 6, "welcom": 1, "well": [1, 17], "were": [1, 7, 18], "what": 1, "when": [1, 2, 8], "whenev": 2, "where": [2, 7, 9, 10], "whether": [2, 6, 7, 9, 10, 16, 18], "which": [1, 8, 13, 15, 16, 18], "whichev": 3, "while": [9, 18], "why": 1, "width": [7, 9], "wiki": 1, "wildreceipt": [4, 6, 16], "window": [8, 10], "wish": 2, "within": 1, "without": [1, 6, 8], "wonder": 2, "word": [4, 6, 8, 10, 18], "word_1_1": 18, "word_1_2": 18, "word_1_3": 18, "wordgener": [6, 16], "words_onli": 10, "work": [12, 13, 18], "workflow": 2, "worklow": 2, "world": [10, 18], "worth": 8, "wrap": 18, "wrapper": [6, 9], "write": 13, "written": [1, 7], "www": [1, 7, 18], "x": [7, 9, 10], "x_ascend": 18, "x_descend": 18, "x_i": 10, "x_size": 18, "x_wconf": 18, "xhtml": 18, "xmax": 7, "xmin": 7, "xml": 18, "xml_bytes_str": 18, "xml_element": 18, "xml_output": 18, "xmln": 18, "y": 10, "y_i": 10, "y_j": 10, "yet": 15, "ymax": 7, "ymin": 7, "yolov8": 15, "you": [2, 3, 6, 7, 8, 12, 13, 14, 15, 16, 17, 18], "your": [2, 4, 7, 10, 18], "yoursit": 7, "zero": [9, 10], "zoo": 12, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 6, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 6, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 6, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 6, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 6, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 6, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 6, "\u00e4\u00f6\u00e4\u00f6": 6, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 6, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 6, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 6, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 6, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 6, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 6, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0905": 6, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 6, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 6, "\u0950": 6, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 6, "\u09bd": 6, "\u09ce": 6, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 6}, "titles": ["Changelog", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 2, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 1], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 1], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 1], "31": 0, "4": [0, 1], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 18, "approach": 18, "architectur": 18, "arg": [6, 7, 8, 9, 10], "artefact": 7, "artefactdetect": 15, "attribut": 1, "avail": [15, 16, 18], "aw": 13, "ban": 1, "block": 7, "bug": 2, "changelog": 0, "choos": [16, 18], "classif": [8, 12, 14], "code": [1, 2], "codebas": 2, "commit": 2, "commun": 14, "compos": 9, "conda": 3, "conduct": 1, "connect": 2, "continu": 2, "contrib": 5, "contribut": [2, 5, 15], "contributor": 1, "convent": 14, "correct": 1, "coven": 1, "custom": [6, 12], "data": 16, "dataload": 6, "dataset": [4, 6, 16], "detect": [4, 8, 14, 16, 18], "develop": 2, "do": 18, "doctr": [2, 4, 5, 6, 7, 8, 9, 10, 11], "document": [2, 4, 7], "end": 18, "enforc": 1, "evalu": 10, "export": 17, "factori": 8, "featur": [2, 4], "feedback": 2, "file": 7, "from": 14, "gener": [6, 16], "git": 3, "guidelin": 1, "half": 17, "hub": 14, "huggingfac": 14, "i": 18, "infer": 17, "instal": [2, 3], "integr": [2, 15], "io": 7, "lambda": 13, "let": 2, "line": 7, "linux": 3, "load": [12, 14, 16], "loader": 6, "main": 4, "mode": 2, "model": [4, 8, 12, 14, 17, 18], "modifi": 2, "modul": [5, 15], "name": 14, "notebook": 11, "object": 16, "ocr": [16, 18], "onli": 3, "onnx": 17, "optim": 17, "option": 18, "orient": 12, "our": 1, "output": 18, "own": [12, 16], "packag": 3, "page": 7, "perman": 1, "pipelin": 15, "pledg": 1, "precis": 17, "predictor": 18, "prepar": 17, "prerequisit": 3, "pretrain": 14, "push": 14, "python": 3, "qualiti": 2, "question": 2, "read": 7, "readi": 16, "recognit": [4, 8, 14, 16, 18], "report": 2, "request": 2, "respons": 1, "return": [6, 7, 8, 10], "right": 18, "scope": 1, "share": 14, "should": 18, "stage": 18, "standard": 1, "structur": [2, 7], "style": 2, "support": [4, 5, 6, 9], "synthet": [6, 16], "task": 10, "temporari": 1, "test": 2, "text": [4, 18], "train": 12, "transform": 9, "two": 18, "unit": 2, "us": [16, 17], "util": 10, "v0": 0, "verif": 2, "via": 3, "visual": 10, "vocab": 6, "warn": 1, "what": 18, "word": 7, "your": [12, 14, 15, 16, 17], "zoo": [4, 8]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[1, "correction"]], "2. Warning": [[1, "warning"]], "3. Temporary Ban": [[1, "temporary-ban"]], "4. Permanent Ban": [[1, "permanent-ban"]], "AWS Lambda": [[13, null]], "Advanced options": [[18, "advanced-options"]], "Args:": [[6, "args"], [6, "id4"], [6, "id7"], [6, "id10"], [6, "id13"], [6, "id16"], [6, "id19"], [6, "id22"], [6, "id25"], [6, "id29"], [6, "id32"], [6, "id37"], [6, "id40"], [6, "id46"], [6, "id49"], [6, "id50"], [6, "id51"], [6, "id54"], [6, "id57"], [6, "id60"], [6, "id61"], [7, "args"], [7, "id2"], [7, "id3"], [7, "id4"], [7, "id5"], [7, "id6"], [7, "id7"], [7, "id10"], [7, "id12"], [7, "id14"], [7, "id16"], [7, "id20"], [7, "id24"], [7, "id28"], [8, "args"], [8, "id3"], [8, "id8"], [8, "id13"], [8, "id17"], [8, "id21"], [8, "id26"], [8, "id31"], [8, "id36"], [8, "id41"], [8, "id46"], [8, "id50"], [8, "id54"], [8, "id59"], [8, "id63"], [8, "id68"], [8, "id73"], [8, "id77"], [8, "id81"], [8, "id85"], [8, "id90"], [8, "id95"], [8, "id99"], [8, "id104"], [8, "id109"], [8, "id114"], [8, "id119"], [8, "id123"], [8, "id127"], [8, "id132"], [8, "id137"], [8, "id142"], [8, "id146"], [8, "id150"], [8, "id155"], [8, "id159"], [8, "id163"], [8, "id167"], [8, "id169"], [8, "id171"], [8, "id173"], [9, "args"], [9, "id1"], [9, "id2"], [9, "id3"], [9, "id4"], [9, "id5"], [9, "id6"], [9, "id7"], [9, "id8"], [9, "id9"], [9, "id10"], [9, "id11"], [9, "id12"], [9, "id13"], [9, "id14"], [9, "id15"], [9, "id16"], [9, "id17"], [9, "id18"], [9, "id19"], [10, "args"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"]], "Artefact": [[7, "artefact"]], "ArtefactDetection": [[15, "artefactdetection"]], "Attribution": [[1, "attribution"]], "Available Datasets": [[16, "available-datasets"]], "Available architectures": [[18, "available-architectures"], [18, "id1"], [18, "id2"]], "Available contribution modules": [[15, "available-contribution-modules"]], "Block": [[7, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[16, null]], "Choosing the right model": [[18, null]], "Classification": [[14, "classification"]], "Code quality": [[2, "code-quality"]], "Code style verification": [[2, "code-style-verification"]], "Codebase structure": [[2, "codebase-structure"]], "Commits": [[2, "commits"]], "Composing transformations": [[9, "composing-transformations"]], "Continuous Integration": [[2, "continuous-integration"]], "Contributing to docTR": [[2, null]], "Contributor Covenant Code of Conduct": [[1, null]], "Custom dataset loader": [[6, "custom-dataset-loader"]], "Custom orientation classification models": [[12, "custom-orientation-classification-models"]], "Data Loading": [[16, "data-loading"]], "Dataloader": [[6, "dataloader"]], "Detection": [[14, "detection"], [16, "detection"]], "Detection predictors": [[18, "detection-predictors"]], "Developer mode installation": [[2, "developer-mode-installation"]], "Developing docTR": [[2, "developing-doctr"]], "Document": [[7, "document"]], "Document structure": [[7, "document-structure"]], "End-to-End OCR": [[18, "end-to-end-ocr"]], "Enforcement": [[1, "enforcement"]], "Enforcement Guidelines": [[1, "enforcement-guidelines"]], "Enforcement Responsibilities": [[1, "enforcement-responsibilities"]], "Export to ONNX": [[17, "export-to-onnx"]], "Feature requests & bug report": [[2, "feature-requests-bug-report"]], "Feedback": [[2, "feedback"]], "File reading": [[7, "file-reading"]], "Half-precision": [[17, "half-precision"]], "Installation": [[3, null]], "Integrate contributions into your pipeline": [[15, null]], "Let\u2019s connect": [[2, "let-s-connect"]], "Line": [[7, "line"]], "Loading from Huggingface Hub": [[14, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[12, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[12, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[4, "main-features"]], "Model optimization": [[17, "model-optimization"]], "Model zoo": [[4, "model-zoo"]], "Modifying the documentation": [[2, "modifying-the-documentation"]], "Naming conventions": [[14, "naming-conventions"]], "OCR": [[16, "ocr"]], "Object Detection": [[16, "object-detection"]], "Our Pledge": [[1, "our-pledge"]], "Our Standards": [[1, "our-standards"]], "Page": [[7, "page"]], "Preparing your model for inference": [[17, null]], "Prerequisites": [[3, "prerequisites"]], "Pretrained community models": [[14, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[14, "pushing-to-the-huggingface-hub"]], "Questions": [[2, "questions"]], "Recognition": [[14, "recognition"], [16, "recognition"]], "Recognition predictors": [[18, "recognition-predictors"]], "Returns:": [[6, "returns"], [7, "returns"], [7, "id11"], [7, "id13"], [7, "id15"], [7, "id19"], [7, "id23"], [7, "id27"], [7, "id31"], [8, "returns"], [8, "id6"], [8, "id11"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id29"], [8, "id34"], [8, "id39"], [8, "id44"], [8, "id49"], [8, "id53"], [8, "id57"], [8, "id62"], [8, "id66"], [8, "id71"], [8, "id76"], [8, "id80"], [8, "id84"], [8, "id88"], [8, "id93"], [8, "id98"], [8, "id102"], [8, "id107"], [8, "id112"], [8, "id117"], [8, "id122"], [8, "id126"], [8, "id130"], [8, "id135"], [8, "id140"], [8, "id145"], [8, "id149"], [8, "id153"], [8, "id158"], [8, "id162"], [8, "id166"], [8, "id168"], [8, "id170"], [8, "id172"], [10, "returns"]], "Scope": [[1, "scope"]], "Share your model with the community": [[14, null]], "Supported Vocabs": [[6, "supported-vocabs"]], "Supported contribution modules": [[5, "supported-contribution-modules"]], "Supported datasets": [[4, "supported-datasets"]], "Supported transformations": [[9, "supported-transformations"]], "Synthetic dataset generator": [[6, "synthetic-dataset-generator"], [16, "synthetic-dataset-generator"]], "Task evaluation": [[10, "task-evaluation"]], "Text Detection": [[18, "text-detection"]], "Text Recognition": [[18, "text-recognition"]], "Text detection models": [[4, "text-detection-models"]], "Text recognition models": [[4, "text-recognition-models"]], "Train your own model": [[12, null]], "Two-stage approaches": [[18, "two-stage-approaches"]], "Unit tests": [[2, "unit-tests"]], "Use your own datasets": [[16, "use-your-own-datasets"]], "Using your ONNX exported model": [[17, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[3, "via-conda-only-for-linux"]], "Via Git": [[3, "via-git"]], "Via Python Package": [[3, "via-python-package"]], "Visualization": [[10, "visualization"]], "What should I do with the output?": [[18, "what-should-i-do-with-the-output"]], "Word": [[7, "word"]], "docTR Notebooks": [[11, null]], "docTR Vocabs": [[6, "id62"]], "docTR: Document Text Recognition": [[4, null]], "doctr.contrib": [[5, null]], "doctr.datasets": [[6, null], [6, "datasets"]], "doctr.io": [[7, null]], "doctr.models": [[8, null]], "doctr.models.classification": [[8, "doctr-models-classification"]], "doctr.models.detection": [[8, "doctr-models-detection"]], "doctr.models.factory": [[8, "doctr-models-factory"]], "doctr.models.recognition": [[8, "doctr-models-recognition"]], "doctr.models.zoo": [[8, "doctr-models-zoo"]], "doctr.transforms": [[9, null]], "doctr.utils": [[10, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[7, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[7, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[9, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[6, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[9, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[9, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[6, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[6, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[8, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[6, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[6, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[7, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[7, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[6, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[6, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[9, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[9, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[6, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[6, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[6, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[6, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[6, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[8, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[9, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[7, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[6, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[9, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[8, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[6, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[9, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[7, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[9, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[9, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[9, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[9, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[9, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[9, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[9, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[9, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[9, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[9, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[9, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[9, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[7, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[7, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[7, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[6, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[9, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[7, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[7, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[6, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[6, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[6, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[6, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[9, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[10, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[6, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[7, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[6, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[6, 0, 1, "", "CORD"], [6, 0, 1, "", "CharacterGenerator"], [6, 0, 1, "", "DetectionDataset"], [6, 0, 1, "", "DocArtefacts"], [6, 0, 1, "", "FUNSD"], [6, 0, 1, "", "IC03"], [6, 0, 1, "", "IC13"], [6, 0, 1, "", "IIIT5K"], [6, 0, 1, "", "IIITHWS"], [6, 0, 1, "", "IMGUR5K"], [6, 0, 1, "", "MJSynth"], [6, 0, 1, "", "OCRDataset"], [6, 0, 1, "", "RecognitionDataset"], [6, 0, 1, "", "SROIE"], [6, 0, 1, "", "SVHN"], [6, 0, 1, "", "SVT"], [6, 0, 1, "", "SynthText"], [6, 0, 1, "", "WILDRECEIPT"], [6, 0, 1, "", "WordGenerator"], [6, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[6, 0, 1, "", "DataLoader"]], "doctr.io": [[7, 0, 1, "", "Artefact"], [7, 0, 1, "", "Block"], [7, 0, 1, "", "Document"], [7, 0, 1, "", "DocumentFile"], [7, 0, 1, "", "Line"], [7, 0, 1, "", "Page"], [7, 0, 1, "", "Word"], [7, 1, 1, "", "decode_img_as_tensor"], [7, 1, 1, "", "read_html"], [7, 1, 1, "", "read_img_as_numpy"], [7, 1, 1, "", "read_img_as_tensor"], [7, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[7, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[7, 2, 1, "", "from_images"], [7, 2, 1, "", "from_pdf"], [7, 2, 1, "", "from_url"]], "doctr.io.Page": [[7, 2, 1, "", "show"]], "doctr.models": [[8, 1, 1, "", "kie_predictor"], [8, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[8, 1, 1, "", "crop_orientation_predictor"], [8, 1, 1, "", "magc_resnet31"], [8, 1, 1, "", "mobilenet_v3_large"], [8, 1, 1, "", "mobilenet_v3_large_r"], [8, 1, 1, "", "mobilenet_v3_small"], [8, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [8, 1, 1, "", "mobilenet_v3_small_page_orientation"], [8, 1, 1, "", "mobilenet_v3_small_r"], [8, 1, 1, "", "page_orientation_predictor"], [8, 1, 1, "", "resnet18"], [8, 1, 1, "", "resnet31"], [8, 1, 1, "", "resnet34"], [8, 1, 1, "", "resnet50"], [8, 1, 1, "", "textnet_base"], [8, 1, 1, "", "textnet_small"], [8, 1, 1, "", "textnet_tiny"], [8, 1, 1, "", "vgg16_bn_r"], [8, 1, 1, "", "vit_b"], [8, 1, 1, "", "vit_s"]], "doctr.models.detection": [[8, 1, 1, "", "db_mobilenet_v3_large"], [8, 1, 1, "", "db_resnet50"], [8, 1, 1, "", "detection_predictor"], [8, 1, 1, "", "fast_base"], [8, 1, 1, "", "fast_small"], [8, 1, 1, "", "fast_tiny"], [8, 1, 1, "", "linknet_resnet18"], [8, 1, 1, "", "linknet_resnet34"], [8, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[8, 1, 1, "", "from_hub"], [8, 1, 1, "", "login_to_hub"], [8, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[8, 1, 1, "", "crnn_mobilenet_v3_large"], [8, 1, 1, "", "crnn_mobilenet_v3_small"], [8, 1, 1, "", "crnn_vgg16_bn"], [8, 1, 1, "", "master"], [8, 1, 1, "", "parseq"], [8, 1, 1, "", "recognition_predictor"], [8, 1, 1, "", "sar_resnet31"], [8, 1, 1, "", "vitstr_base"], [8, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[9, 0, 1, "", "ChannelShuffle"], [9, 0, 1, "", "ColorInversion"], [9, 0, 1, "", "Compose"], [9, 0, 1, "", "GaussianBlur"], [9, 0, 1, "", "GaussianNoise"], [9, 0, 1, "", "LambdaTransformation"], [9, 0, 1, "", "Normalize"], [9, 0, 1, "", "OneOf"], [9, 0, 1, "", "RandomApply"], [9, 0, 1, "", "RandomBrightness"], [9, 0, 1, "", "RandomContrast"], [9, 0, 1, "", "RandomCrop"], [9, 0, 1, "", "RandomGamma"], [9, 0, 1, "", "RandomHorizontalFlip"], [9, 0, 1, "", "RandomHue"], [9, 0, 1, "", "RandomJpegQuality"], [9, 0, 1, "", "RandomResize"], [9, 0, 1, "", "RandomRotate"], [9, 0, 1, "", "RandomSaturation"], [9, 0, 1, "", "RandomShadow"], [9, 0, 1, "", "Resize"], [9, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[10, 0, 1, "", "DetectionMetric"], [10, 0, 1, "", "LocalizationConfusion"], [10, 0, 1, "", "OCRMetric"], [10, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.visualization": [[10, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [1, 7, 8, 10, 14, 17], "0": [1, 3, 6, 9, 10, 12, 15, 16, 18], "00": 18, "01": 18, "0123456789": 6, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "02562": 8, "03": 18, "035": 18, "0361328125": 18, "04": 18, "05": 18, "06": 18, "06640625": 18, "07": 18, "08": [9, 18], "09": 18, "0966796875": 18, "1": [6, 7, 8, 9, 10, 12, 16, 18], "10": [6, 10, 18], "100": [6, 9, 10, 16, 18], "1000": 18, "101": 6, "1024": [8, 12, 18], "104": 6, "106": 6, "108": 6, "1095": 16, "11": 18, "110": 10, "1107": 16, "114": 6, "115": 6, "1156": 16, "116": 6, "118": 6, "11800h": 18, "11th": 18, "12": 18, "120": 6, "123": 6, "126": 6, "1268": 16, "128": [8, 12, 17, 18], "13": 18, "130": 6, "13068": 16, "131": 6, "1337891": 16, "1357421875": 18, "1396484375": 18, "14": 18, "1420": 18, "14470v1": 6, "149": 16, "15": 18, "150": [10, 18], "1552": 18, "16": [8, 17, 18], "1630859375": 18, "1684": 18, "16x16": 8, "17": 18, "1778": 18, "1782": 18, "18": [8, 18], "185546875": 18, "1900": 18, "1910": 8, "19342": 16, "19370": 16, "195": 6, "19598": 16, "199": 18, "1999": 18, "2": [3, 4, 6, 7, 9, 15, 18], "20": 18, "200": 10, "2000": 16, "2003": [4, 6], "2012": 6, "2013": [4, 6], "2015": 6, "2019": 4, "207901": 16, "21": 18, "2103": 6, "2186": 16, "21888": 16, "22": 18, "224": [8, 9], "225": 9, "22672": 16, "229": [9, 16], "23": 18, "233": 16, "236": 6, "24": 18, "246": 16, "249": 16, "25": 18, "2504": 18, "255": [7, 8, 9, 10, 18], "256": 8, "257": 16, "26": 18, "26032": 16, "264": 12, "27": 18, "2700": 16, "2710": 18, "2749": 12, "28": 18, "287": 12, "29": 18, "296": 12, "299": 12, "2d": 18, "3": [3, 4, 7, 8, 9, 10, 17, 18], "30": 18, "300": 16, "3000": 16, "301": 12, "30595": 18, "30ghz": 18, "31": 8, "32": [6, 8, 9, 12, 16, 17, 18], "3232421875": 18, "33": [9, 18], "33402": 16, "33608": 16, "34": [8, 18], "340": 18, "3456": 18, "3515625": 18, "36": 18, "360": 16, "37": [6, 18], "38": 18, "39": 18, "4": [8, 9, 10, 18], "40": 18, "406": 9, "41": 18, "42": 18, "43": 18, "44": 18, "45": 18, "456": 9, "46": 18, "47": 18, "472": 16, "48": [6, 18], "485": 9, "49": 18, "49377": 16, "5": [6, 9, 10, 15, 18], "50": [8, 16, 18], "51": 18, "51171875": 18, "512": 8, "52": [6, 18], "529": 18, "53": 18, "54": 18, "540": 18, "5478515625": 18, "55": 18, "56": 18, "57": 18, "58": [6, 18], "580": 18, "5810546875": 18, "583": 18, "59": 18, "597": 18, "5k": [4, 6], "5m": 18, "6": [9, 18], "60": 9, "600": [8, 10, 18], "61": 18, "62": 18, "626": 16, "63": 18, "64": [8, 9, 18], "641": 18, "647": 16, "65": 18, "66": 18, "67": 18, "68": 18, "69": 18, "693": 12, "694": 12, "695": 12, "6m": 18, "7": 18, "70": [6, 10, 18], "707470": 16, "71": [6, 18], "7100000": 16, "7141797": 16, "7149": 16, "72": 18, "72dpi": 7, "73": 18, "73257": 16, "74": 18, "75": [9, 18], "7581382": 16, "76": 18, "77": 18, "772": 12, "772875": 16, "78": 18, "785": 12, "79": 18, "793533": 16, "796": 16, "798": 12, "7m": 18, "8": [8, 9, 18], "80": 18, "800": [8, 10, 16, 18], "81": 18, "82": 18, "83": 18, "84": 18, "849": 16, "85": 18, "8564453125": 18, "857": 18, "85875": 16, "86": 18, "8603515625": 18, "87": 18, "8707": 16, "88": 18, "89": 18, "9": [3, 9, 18], "90": 18, "90k": 6, "90kdict32px": 6, "91": 18, "914085328578949": 18, "92": 18, "93": 18, "94": [6, 18], "95": [10, 18], "9578408598899841": 18, "96": 18, "97": 18, "98": 18, "99": 18, "9949972033500671": 18, "A": [1, 2, 4, 6, 7, 8, 11, 17], "As": 2, "Be": 18, "Being": 1, "By": 13, "For": [1, 2, 3, 12, 18], "If": [2, 7, 8, 12, 18], "In": [2, 6, 16], "It": [9, 14, 15, 17], "Its": [4, 8], "No": [1, 18], "Of": 6, "Or": [15, 17], "The": [1, 2, 6, 7, 10, 13, 15, 16, 17, 18], "Then": 8, "To": [2, 3, 13, 14, 15, 17, 18], "_": [1, 6, 8], "__call__": 18, "_build": 2, "_i": 10, "ab": 6, "abc": 17, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "abdef": [6, 16], "abl": [16, 18], "about": [1, 16, 18], "abov": 18, "abstractdataset": 6, "abus": 1, "accept": 1, "access": [4, 7, 16, 18], "account": [1, 14], "accur": 18, "accuraci": 10, "achiev": 17, "act": 1, "action": 1, "activ": 4, "ad": [2, 8, 9], "adapt": 1, "add": [9, 10, 14, 18], "add_hook": 18, "add_label": 10, "addit": [2, 3, 7, 15, 18], "addition": [2, 18], "address": [1, 7], "adjust": 9, "advanc": 1, "advantag": 17, "advis": 2, "aesthet": [4, 6], "affect": 1, "after": [14, 18], "ag": 1, "again": 8, "aggreg": [10, 16], "aggress": 1, "align": [1, 7, 9], "all": [1, 2, 5, 6, 7, 9, 10, 15, 16, 18], "allow": [1, 17], "along": 18, "alreadi": [2, 17], "also": [1, 8, 14, 15, 16, 18], "alwai": 16, "an": [1, 2, 4, 6, 7, 8, 10, 15, 17, 18], "analysi": [7, 15], "ancient_greek": 6, "angl": [7, 9], "ani": [1, 6, 7, 8, 9, 10, 17, 18], "annot": 6, "anot": 16, "anoth": [8, 12, 16], "answer": 1, "anyascii": 10, "anyon": 4, "anyth": 15, "api": [2, 4], "apolog": 1, "apologi": 1, "app": 2, "appear": 1, "appli": [1, 6, 9], "applic": [4, 8], "appoint": 1, "appreci": 14, "appropri": [1, 2, 18], "ar": [1, 2, 3, 5, 6, 7, 9, 10, 11, 15, 16, 18], "arab": 6, "arabic_diacrit": 6, "arabic_lett": 6, "arabic_punctu": 6, "arbitrarili": [4, 8], "arch": [8, 14], "architectur": [4, 8, 14, 15], "area": 18, "argument": [6, 7, 8, 10, 12, 18], "around": 1, "arrai": [7, 9, 10], "art": [4, 15], "artefact": [10, 15, 18], "artefact_typ": 7, "artifici": [4, 6], "arxiv": [6, 8], "asarrai": 10, "ascii_lett": 6, "aspect": [4, 8, 9, 18], "assess": 10, "assign": 10, "associ": 7, "assum": 8, "assume_straight_pag": [8, 12, 18], "astyp": [8, 10, 18], "attack": 1, "attend": [4, 8], "attent": [1, 8], "autom": 4, "automat": 18, "autoregress": [4, 8], "avail": [1, 4, 5, 9], "averag": [9, 18], "avoid": [1, 3], "aw": [4, 18], "awar": 18, "azur": 18, "b": [8, 10, 18], "b_j": 10, "back": 2, "backbon": 8, "backend": 18, "background": 16, "bangla": 6, "bar": 15, "bar_cod": 16, "base": [4, 8, 15], "baselin": [4, 8, 18], "batch": [6, 8, 9, 15, 16, 18], "batch_siz": [6, 12, 15, 16, 17], "bblanchon": 3, "bbox": 18, "becaus": 13, "been": [2, 10, 16, 18], "befor": [6, 8, 9, 18], "begin": 10, "behavior": [1, 18], "being": [10, 18], "belong": 18, "benchmark": 18, "best": 1, "better": [11, 18], "between": [9, 10, 18], "bgr": 7, "bilinear": 9, "bin_thresh": 18, "binar": [4, 8, 18], "binari": [7, 17, 18], "bit": 17, "block": [10, 18], "block_1_1": 18, "blur": 9, "bmvc": 6, "bn": 14, "bodi": [1, 18], "bool": [6, 7, 8, 9, 10], "boolean": [8, 18], "both": [4, 6, 9, 16, 18], "bottom": [8, 18], "bound": [6, 7, 8, 9, 10, 15, 16, 18], "box": [6, 7, 8, 9, 10, 15, 16, 18], "box_thresh": 18, "bright": 9, "browser": [2, 4], "build": [2, 3, 17], "built": 2, "byte": [7, 18], "c": [3, 7, 10], "c_j": 10, "cach": [2, 6, 13], "cache_sampl": 6, "call": 17, "callabl": [6, 9], "can": [2, 3, 12, 13, 14, 15, 16, 18], "capabl": [2, 11, 18], "case": [6, 10], "cf": 18, "cfg": 18, "challeng": 6, "challenge2_test_task12_imag": 6, "challenge2_test_task1_gt": 6, "challenge2_training_task12_imag": 6, "challenge2_training_task1_gt": 6, "chang": [13, 18], "channel": [1, 2, 7, 9], "channel_prior": 3, "channelshuffl": 9, "charact": [4, 6, 7, 10, 16, 18], "charactergener": [6, 16], "characterist": 1, "charg": 18, "charset": 18, "chart": 7, "check": [2, 14, 18], "checkpoint": 8, "chip": 3, "ci": 2, "clarifi": 1, "clariti": 1, "class": [1, 6, 7, 9, 10, 18], "class_nam": 12, "classif": [16, 18], "classmethod": 7, "clear": 2, "clone": 3, "close": 2, "co": 14, "code": [4, 7, 15], "codecov": 2, "colab": 11, "collate_fn": 6, "collect": [7, 15], "color": 9, "colorinvers": 9, "column": 7, "com": [1, 3, 7, 8, 14], "combin": 18, "command": [2, 15], "comment": 1, "commit": 1, "common": [1, 9, 10, 17], "commun": 1, "compar": 4, "comparison": [10, 18], "competit": 6, "compil": [11, 18], "complaint": 1, "complementari": 10, "complet": 2, "compon": 18, "compos": [6, 18], "comprehens": 18, "comput": [6, 10, 17, 18], "conf_threshold": 15, "confid": [7, 18], "config": [3, 8], "configur": 8, "confus": 10, "consecut": [9, 18], "consequ": 1, "consid": [1, 2, 6, 7, 10, 18], "consist": 18, "consolid": [4, 6], "constant": 9, "construct": 1, "contact": 1, "contain": [5, 6, 11, 16, 18], "content": [6, 7, 18], "context": 8, "contib": 3, "continu": 1, "contrast": 9, "contrast_factor": 9, "contrib": [3, 15], "contribut": 1, "contributor": 2, "convers": 7, "convert": [7, 9], "convolut": 8, "coordin": [7, 18], "cord": [4, 6, 16, 18], "core": [10, 18], "corner": 18, "correct": 9, "correspond": [3, 7, 9, 18], "could": [1, 15], "counterpart": 10, "cover": 2, "coverag": 2, "cpu": [4, 12, 17], "creat": 14, "crnn": [4, 8, 14], "crnn_mobilenet_v3_larg": [8, 14, 18], "crnn_mobilenet_v3_smal": [8, 17, 18], "crnn_vgg16_bn": [8, 12, 14, 18], "crop": [7, 8, 9, 12, 16, 18], "crop_orient": [7, 18], "crop_orientation_predictor": [8, 12], "crop_param": 12, "cuda": 17, "currenc": 6, "current": [2, 12, 18], "custom": [14, 15, 17, 18], "custom_crop_orientation_model": 12, "custom_page_orientation_model": 12, "customhook": 18, "cvit": 4, "czczup": 8, "czech": 6, "d": [6, 16], "danish": 6, "data": [4, 6, 7, 9, 10, 12, 14], "dataload": 16, "dataset": [8, 12, 18], "dataset_info": 6, "date": [12, 18], "db": 14, "db_mobilenet_v3_larg": [8, 14, 18], "db_resnet34": 18, "db_resnet50": [8, 12, 14, 18], "dbnet": [4, 8], "deal": [11, 18], "decis": 1, "decod": 7, "decode_img_as_tensor": 7, "dedic": 17, "deem": 1, "deep": [8, 18], "def": 18, "default": [3, 7, 12, 13, 18], "defer": 16, "defin": [10, 17], "degre": [7, 9, 18], "degress": 7, "delet": 2, "delimit": 18, "delta": 9, "demo": [2, 4], "demonstr": 1, "depend": [2, 3, 4, 18], "deploi": 2, "deploy": 4, "derogatori": 1, "describ": 8, "descript": 11, "design": 9, "desir": 7, "det_arch": [8, 12, 14, 17], "det_b": 18, "det_model": [12, 14, 17], "det_param": 12, "det_predictor": [12, 18], "detail": [12, 18], "detect": [6, 7, 10, 11, 12, 15], "detect_languag": 8, "detect_orient": [8, 12, 18], "detection_predictor": [8, 18], "detection_task": [6, 16], "detectiondataset": [6, 16], "detectionmetr": 10, "detectionpredictor": [8, 12], "detector": [4, 8, 15], "deterior": 8, "determin": 1, "dev": [2, 13], "develop": 3, "deviat": 9, "devic": 17, "dict": [7, 10, 18], "dictionari": [7, 10], "differ": 1, "differenti": [4, 8], "digit": [4, 6, 16], "dimens": [7, 10, 18], "dimension": 9, "direct": 6, "directli": [14, 18], "directori": [2, 13], "disabl": [1, 13, 18], "disable_crop_orient": 18, "disable_page_orient": 18, "disclaim": 18, "discuss": 2, "disparag": 1, "displai": [7, 10], "display_artefact": 10, "distribut": 9, "div": 18, "divers": 1, "divid": 7, "do": [2, 3, 8], "doc": [2, 7, 15, 17, 18], "docartefact": [6, 16], "docstr": 2, "doctr": [3, 12, 13, 14, 15, 16, 17, 18], "doctr_cache_dir": 13, "doctr_multiprocessing_dis": 13, "document": [6, 8, 10, 11, 12, 15, 16, 17, 18], "documentbuild": 18, "documentfil": [7, 12, 14, 15, 17], "doesn": 17, "don": [12, 18], "done": 9, "download": [6, 16], "downsiz": 8, "draw": 9, "drop": 6, "drop_last": 6, "dtype": [7, 8, 9, 10, 17], "dual": [4, 6], "dummi": 14, "dummy_img": 18, "dummy_input": 17, "dure": 1, "dutch": 6, "dynam": [6, 15], "dynamic_seq_length": 6, "e": [1, 2, 3, 7, 8], "each": [4, 6, 7, 8, 9, 10, 16, 18], "eas": 2, "easi": [4, 10, 14, 17], "easili": [7, 10, 12, 14, 16, 18], "econom": 1, "edit": 1, "educ": 1, "effect": 18, "effici": [2, 4, 6, 8], "either": [10, 18], "element": [6, 7, 8, 18], "els": [2, 15], "email": 1, "empathi": 1, "en": 18, "enabl": [6, 7], "enclos": 7, "encod": [4, 6, 7, 8, 18], "encode_sequ": 6, "encount": 2, "encrypt": 7, "end": [4, 6, 8, 10], "english": [6, 16], "enough": [2, 18], "ensur": 2, "entri": 6, "environ": [1, 13], "eo": 6, "equiv": 18, "estim": 8, "etc": [7, 15], "ethnic": 1, "evalu": [16, 18], "event": 1, "everyon": 1, "everyth": [2, 18], "exact": [10, 18], "exampl": [1, 2, 4, 6, 8, 14, 18], "exchang": 17, "execut": 18, "exist": 14, "expand": 9, "expect": [7, 9, 10], "experi": 1, "explan": [1, 18], "explicit": 1, "exploit": [4, 8], "export": [7, 8, 10, 11, 15, 18], "export_as_straight_box": [8, 18], "export_as_xml": 18, "export_model_to_onnx": 17, "express": [1, 9], "extens": 7, "extern": [1, 16], "extract": [4, 6], "extractor": 8, "f_": 10, "f_a": 10, "factor": 9, "fair": 1, "fairli": 1, "fals": [6, 7, 8, 9, 10, 12, 18], "faq": 1, "fascan": 14, "fast": [4, 6, 8], "fast_bas": [8, 18], "fast_smal": [8, 18], "fast_tini": [8, 18], "faster": [4, 8, 17], "fasterrcnn_mobilenet_v3_large_fpn": 8, "favorit": 18, "featur": [3, 8, 10, 11, 12, 15], "feedback": 1, "feel": [2, 14], "felix92": 14, "few": [17, 18], "figsiz": 10, "figur": [10, 15], "file": [2, 6], "final": 8, "find": [2, 16], "finnish": 6, "first": [2, 6], "firsthand": 6, "fit": [8, 18], "flag": 18, "flip": 9, "float": [7, 9, 10, 17], "float32": [7, 8, 9, 17], "fn": 9, "focu": 14, "focus": [1, 6], "folder": 6, "follow": [1, 2, 3, 6, 9, 10, 12, 13, 14, 15, 18], "font": 6, "font_famili": 6, "foral": 10, "forc": 2, "forg": 3, "form": [4, 6, 18], "format": [7, 10, 12, 16, 17, 18], "forpost": [4, 6], "forum": 2, "fp16": 17, "frac": 10, "framework": [3, 14, 16, 18], "free": [1, 2, 14], "french": [6, 12, 14, 18], "friendli": 4, "from": [1, 4, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18], "from_hub": [8, 14], "from_imag": [7, 14, 15, 17], "from_pdf": 7, "from_url": 7, "full": [6, 10, 18], "function": [6, 9, 10, 15], "funsd": [4, 6, 16, 18], "further": 16, "futur": 6, "g": [7, 8], "g_": 10, "g_x": 10, "gamma": 9, "gaussian": 9, "gaussianblur": 9, "gaussiannois": 9, "gen": 18, "gender": 1, "gener": [2, 4, 7, 8], "generic_cyrillic_lett": 6, "geometri": [4, 7, 18], "geq": 10, "german": [6, 12, 14], "get": [17, 18], "git": 14, "github": [2, 3, 8, 14], "give": [1, 15], "given": [6, 7, 9, 10, 18], "global": 8, "go": 18, "good": 17, "googl": 2, "googlevis": 4, "gpu": [4, 15, 17], "gracefulli": 1, "graph": [4, 6, 7], "grayscal": 9, "ground": 10, "groung": 10, "group": [4, 18], "gt": 10, "gt_box": 10, "gt_label": 10, "guid": 2, "guidanc": 16, "gvision": 18, "h": [7, 8, 9], "h_": 10, "ha": [2, 6, 10, 16], "handl": [11, 16, 18], "handwrit": 6, "handwritten": 16, "harass": 1, "hardwar": 18, "harm": 1, "hat": 10, "have": [1, 2, 10, 12, 14, 16, 17, 18], "head": [8, 18], "healthi": 1, "hebrew": 6, "height": [7, 9], "hello": [10, 18], "help": 17, "here": [5, 9, 11, 15, 16, 18], "hf": 8, "hf_hub_download": 8, "high": 7, "higher": [3, 6, 18], "hindi": 6, "hindi_digit": 6, "hocr": 18, "hook": 18, "horizont": [7, 9, 18], "hous": 6, "how": [2, 11, 12, 14, 16], "howev": 16, "hsv": 9, "html": [1, 2, 3, 7, 18], "http": [1, 3, 6, 7, 8, 14, 18], "hub": 8, "hue": 9, "huggingfac": 8, "hw": 6, "i": [1, 2, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17], "i7": 18, "ic03": [4, 6, 16], "ic13": [4, 6, 16], "icdar": [4, 6], "icdar2019": 6, "id": 18, "ident": 1, "identifi": 4, "iiit": [4, 6], "iiit5k": [6, 16], "iiithw": [4, 6, 16], "imag": [4, 6, 7, 8, 9, 10, 14, 15, 16, 18], "imagenet": 8, "imageri": 1, "images_90k_norm": 6, "img": [6, 9, 16, 17], "img_cont": 7, "img_fold": [6, 16], "img_path": 7, "img_transform": 6, "imgur5k": [4, 6, 16], "imgur5k_annot": 6, "imlist": 6, "impact": 1, "implement": [6, 7, 8, 9, 10, 18], "import": [6, 7, 8, 9, 10, 12, 14, 15, 16, 17, 18], "improv": 8, "inappropri": 1, "incid": 1, "includ": [1, 6, 16, 17], "inclus": 1, "increas": 9, "independ": 9, "index": [2, 7], "indic": 10, "individu": 1, "infer": [4, 8, 9, 15, 18], "inform": [1, 2, 4, 6, 16], "input": [2, 7, 8, 9, 17, 18], "input_crop": 8, "input_pag": [8, 10, 18], "input_shap": 17, "input_tensor": 8, "inspir": [1, 9], "instal": [14, 15, 17], "instanc": [1, 18], "instanti": [8, 18], "instead": [6, 7, 8], "insult": 1, "int": [6, 7, 9], "int64": 10, "integ": 10, "integr": [4, 14, 16], "intel": 18, "interact": [1, 7, 10], "interfac": [14, 17], "interoper": 17, "interpol": 9, "interpret": [6, 7], "intersect": 10, "invert": 9, "investig": 1, "invis": 1, "involv": [1, 18], "io": [12, 14, 15, 17], "iou": 10, "iou_thresh": 10, "iou_threshold": 15, "irregular": [4, 8, 16], "isn": 6, "issu": [1, 2, 14], "italian": 6, "iter": [6, 9, 16, 18], "its": [7, 8, 9, 10, 16, 18], "itself": [8, 14], "j": 10, "job": 2, "join": 2, "jpeg": 9, "jpegqual": 9, "jpg": [6, 7, 14, 17], "json": [6, 16, 18], "json_output": 18, "jump": 2, "just": 1, "kei": [4, 6], "kera": [8, 17], "kernel": [4, 8, 9], "kernel_shap": 9, "keywoard": 8, "keyword": [6, 7, 8, 10], "kie": [8, 12], "kie_predictor": [8, 12], "kiepredictor": 8, "kind": 1, "know": [2, 17], "kwarg": [6, 7, 8, 10], "l": 10, "l_j": 10, "label": [6, 10, 15, 16], "label_fil": [6, 16], "label_fold": 6, "label_path": [6, 16], "labels_path": [6, 16], "ladder": 1, "lambda": 9, "lambdatransform": 9, "lang": 18, "languag": [1, 4, 6, 7, 8, 14, 18], "larg": [8, 14], "largest": 10, "last": [3, 6], "latenc": 8, "later": 2, "latest": 18, "latin": 6, "layer": 17, "layout": 18, "lead": 1, "leader": 1, "learn": [1, 4, 8, 17, 18], "least": 3, "left": [10, 18], "legacy_french": 6, "length": [6, 18], "less": [17, 18], "level": [1, 6, 10, 18], "leverag": 11, "lf": 14, "librari": [2, 3, 11, 12], "light": 4, "lightweight": 17, "like": 1, "limits_": 10, "line": [4, 8, 10, 18], "line_1_1": 18, "link": 12, "linknet": [4, 8], "linknet_resnet18": [8, 12, 17, 18], "linknet_resnet34": [8, 17, 18], "linknet_resnet50": [8, 18], "list": [6, 7, 9, 10, 14], "ll": 10, "load": [4, 6, 8, 15, 17], "load_state_dict": 12, "load_weight": 12, "loc_pr": 18, "local": [2, 4, 6, 8, 10, 16, 18], "localis": 6, "localizationconfus": 10, "locat": [2, 7, 18], "login": 8, "login_to_hub": [8, 14], "logo": [7, 15, 16], "love": 14, "lower": [9, 10, 18], "m": [2, 10, 18], "m1": 3, "macbook": 3, "machin": 17, "made": 4, "magc_resnet31": 8, "mai": [1, 2], "mail": 1, "main": 11, "maintain": 4, "mainten": 2, "make": [1, 2, 10, 12, 13, 14, 17, 18], "mani": [16, 18], "manipul": 18, "map": [6, 8], "map_loc": 12, "master": [4, 8, 18], "match": [10, 18], "mathcal": 10, "matplotlib": [7, 10], "max": [6, 9, 10], "max_angl": 9, "max_area": 9, "max_char": [6, 16], "max_delta": 9, "max_gain": 9, "max_gamma": 9, "max_qual": 9, "max_ratio": 9, "maximum": [6, 9], "maxval": [8, 9], "mbox": 10, "mean": [9, 10, 12], "meaniou": 10, "meant": [7, 17], "measur": 18, "media": 1, "median": 8, "meet": 12, "member": 1, "memori": [13, 17], "mention": 18, "merg": 6, "messag": 2, "meta": 18, "metadata": 17, "metal": 3, "method": [7, 9, 18], "metric": [10, 18], "middl": 18, "might": [17, 18], "min": 9, "min_area": 9, "min_char": [6, 16], "min_gain": 9, "min_gamma": 9, "min_qual": 9, "min_ratio": 9, "min_val": 9, "minde": [1, 3, 4, 8], "minim": [2, 4], "minimalist": [4, 8], "minimum": [3, 6, 9, 10, 18], "minval": 9, "miss": 3, "mistak": 1, "mixed_float16": 17, "mixed_precis": 17, "mjsynth": [4, 6, 16], "mnt": 6, "mobilenet": [8, 14], "mobilenet_v3_larg": 8, "mobilenet_v3_large_r": 8, "mobilenet_v3_smal": [8, 12], "mobilenet_v3_small_crop_orient": [8, 12], "mobilenet_v3_small_page_orient": [8, 12], "mobilenet_v3_small_r": 8, "mobilenetv3": 8, "modal": [4, 6], "mode": 3, "model": [6, 10, 13, 15, 16], "model_nam": [8, 14, 17], "model_path": [15, 17], "moder": 1, "modif": 2, "modifi": [8, 13, 18], "modul": [3, 7, 8, 9, 10, 18], "more": [2, 16, 18], "most": 18, "mozilla": 1, "multi": [4, 8], "multilingu": [6, 14], "multipl": [6, 7, 9, 18], "multipli": 9, "multiprocess": 13, "my": 8, "my_awesome_model": 14, "my_hook": 18, "n": [6, 10], "name": [6, 8, 17, 18], "nation": 1, "natur": [1, 4, 6], "ndarrai": [6, 7, 9, 10], "necessari": [3, 12, 13], "need": [2, 3, 6, 10, 12, 13, 14, 15, 18], "neg": 9, "nest": 18, "network": [4, 6, 8, 17], "neural": [4, 6, 8, 17], "new": [2, 10], "next": [6, 16], "nois": 9, "noisi": [4, 6], "non": [4, 6, 7, 8, 9, 10], "none": [6, 7, 8, 9, 10, 18], "normal": [8, 9], "norwegian": 6, "note": [0, 2, 6, 8, 12, 14, 15, 17], "now": 2, "np": [8, 9, 10, 18], "num_output_channel": 9, "num_sampl": [6, 16], "number": [6, 9, 10, 18], "numpi": [7, 8, 10, 18], "o": 3, "obb": 15, "obj_detect": 14, "object": [6, 7, 10, 15, 18], "objectness_scor": [7, 18], "oblig": 1, "obtain": 18, "occupi": 17, "ocr": [4, 6, 8, 10, 14], "ocr_carea": 18, "ocr_db_crnn": 10, "ocr_lin": 18, "ocr_pag": 18, "ocr_par": 18, "ocr_predictor": [8, 12, 14, 17, 18], "ocrdataset": [6, 16], "ocrmetr": 10, "ocrpredictor": [8, 12], "ocrx_word": 18, "offens": 1, "offici": [1, 8], "offlin": 1, "offset": 9, "onc": 18, "one": [2, 6, 8, 9, 12, 14, 18], "oneof": 9, "ones": [6, 10], "onli": [2, 8, 9, 10, 12, 14, 16, 17, 18], "onlin": 1, "onnx": 15, "onnxruntim": [15, 17], "onnxtr": 17, "opac": 9, "opacity_rang": 9, "open": [1, 2, 14, 17], "opinion": 1, "optic": [4, 18], "optim": [4, 18], "option": [6, 8, 12], "order": [2, 6, 7, 9], "org": [1, 6, 8, 18], "organ": 7, "orient": [1, 7, 8, 11, 15, 18], "orientationpredictor": 8, "other": [1, 2], "otherwis": [1, 7, 10], "our": [2, 8, 18], "out": [2, 8, 9, 10, 18], "outpout": 18, "output": [7, 9, 17], "output_s": [7, 9], "outsid": 13, "over": [6, 10, 18], "overal": [1, 8], "overlai": 7, "overview": 15, "overwrit": 12, "overwritten": 14, "own": 4, "p": [9, 18], "packag": [2, 4, 10, 13, 15, 16, 17], "pad": [6, 8, 9, 18], "page": [3, 6, 8, 10, 12, 18], "page1": 7, "page2": 7, "page_1": 18, "page_idx": [7, 18], "page_orientation_predictor": [8, 12], "page_param": 12, "pair": 10, "paper": 8, "par_1_1": 18, "paragraph": 18, "paragraph_break": 18, "param": [9, 18], "paramet": [4, 7, 8, 17], "pars": [4, 6], "parseq": [4, 8, 14, 17, 18], "part": [6, 9, 18], "parti": 3, "partial": 18, "particip": 1, "pass": [6, 7, 8, 12, 18], "password": 7, "patch": [8, 10], "path": [6, 7, 15, 16, 17], "path_to_checkpoint": 12, "path_to_custom_model": 17, "path_to_pt": 12, "pattern": 1, "pdf": [7, 8, 11], "pdfpage": 7, "peopl": 1, "per": [9, 18], "perform": [4, 7, 8, 9, 10, 13, 17, 18], "period": 1, "permiss": 1, "permut": [4, 8], "persian_lett": 6, "person": [1, 16], "phase": 18, "photo": 16, "physic": [1, 7], "pick": 9, "pictur": 7, "pip": [2, 3, 15, 17], "pipelin": 18, "pixel": [7, 9, 18], "pleas": 2, "plot": 10, "plt": 10, "plug": 14, "plugin": 3, "png": 7, "point": 17, "polici": 13, "polish": 6, "polit": 1, "polygon": [6, 10, 18], "pool": 8, "portugues": 6, "posit": [1, 10], "possibl": [2, 10, 14, 18], "post": [1, 18], "postprocessor": 18, "potenti": 8, "power": 4, "ppageno": 18, "pre": [2, 8, 17], "precis": [10, 18], "pred": 10, "pred_box": 10, "pred_label": 10, "predefin": 16, "predict": [7, 8, 10, 18], "predictor": [4, 7, 8, 11, 12, 14, 17], "prefer": 16, "preinstal": 3, "preprocessor": [12, 18], "prerequisit": 14, "present": 11, "preserv": [8, 9, 18], "preserve_aspect_ratio": [7, 8, 9, 12, 18], "pretrain": [4, 8, 10, 12, 17, 18], "pretrained_backbon": [8, 12], "print": 18, "prior": 6, "privaci": 1, "privat": 1, "probabl": 9, "problem": 2, "procedur": 9, "process": [2, 4, 7, 12, 18], "processor": 18, "produc": [11, 18], "product": 17, "profession": 1, "project": [2, 16], "promptli": 1, "proper": 2, "properli": 6, "provid": [1, 2, 4, 14, 15, 16, 18], "public": [1, 4], "publicli": 18, "publish": 1, "pull": 14, "punctuat": 6, "pure": 6, "purpos": 2, "push_to_hf_hub": [8, 14], "py": 14, "pypdfium2": [3, 7], "pyplot": [7, 10], "python": [2, 15], "python3": 14, "pytorch": [3, 4, 8, 9, 12, 14, 17, 18], "q": 2, "qr": [7, 15], "qr_code": 16, "qualiti": 9, "question": 1, "quickli": 4, "quicktour": 11, "r": 18, "race": 1, "ramdisk": 6, "rand": [8, 9, 10, 17, 18], "random": [8, 9, 10, 18], "randomappli": 9, "randombright": 9, "randomcontrast": 9, "randomcrop": 9, "randomgamma": 9, "randomhorizontalflip": 9, "randomhu": 9, "randomjpegqu": 9, "randomli": 9, "randomres": 9, "randomrot": 9, "randomsatur": 9, "randomshadow": 9, "rang": 9, "rassi": 14, "ratio": [8, 9, 18], "raw": [7, 10], "re": 17, "read": [4, 6, 8], "read_html": 7, "read_img_as_numpi": 7, "read_img_as_tensor": 7, "read_pdf": 7, "readi": 17, "real": [4, 8, 9], "reason": [1, 4, 6], "rebuild": 2, "rebuilt": 2, "recal": [10, 18], "receipt": [4, 6, 18], "reco_arch": [8, 12, 14, 17], "reco_b": 18, "reco_model": [12, 14, 17], "reco_param": 12, "reco_predictor": 12, "recogn": 18, "recognit": [6, 10, 11, 12], "recognition_predictor": [8, 18], "recognition_task": [6, 16], "recognitiondataset": [6, 16], "recognitionpredictor": [8, 12], "rectangular": 8, "reduc": [3, 9], "refer": [2, 3, 12, 14, 15, 16, 18], "regardless": 1, "region": 18, "regroup": 10, "regular": 16, "reject": 1, "rel": [7, 9, 10, 18], "relat": 7, "releas": [0, 3], "relev": 15, "religion": 1, "remov": 1, "render": [7, 18], "repo": 8, "repo_id": [8, 14], "report": 1, "repositori": [6, 8, 14], "repres": [1, 17, 18], "represent": [4, 8], "request": [1, 14], "requir": [3, 9, 17], "research": 4, "residu": 8, "resiz": [9, 18], "resnet": 8, "resnet18": [8, 14], "resnet31": 8, "resnet34": 8, "resnet50": [8, 14], "resolv": 7, "resolve_block": 18, "resolve_lin": 18, "resourc": 16, "respect": 1, "rest": [2, 9, 10], "restrict": 13, "result": [2, 6, 7, 11, 14, 17, 18], "return": 18, "reusabl": 18, "review": 1, "rgb": [7, 9], "rgb_mode": 7, "rgb_output": 7, "right": [1, 8, 10], "robust": [4, 6], "root": 6, "rotat": [6, 7, 8, 9, 10, 11, 12, 16, 18], "run": [2, 3, 8], "same": [2, 7, 10, 16, 17, 18], "sampl": [6, 16, 18], "sample_transform": 6, "sar": [4, 8], "sar_resnet31": [8, 18], "satur": 9, "save": [8, 16], "scale": [7, 8, 9, 10], "scale_rang": 9, "scan": [4, 6], "scene": [4, 6, 8], "score": [7, 10], "script": [2, 16], "seamless": 4, "seamlessli": [4, 18], "search": 8, "searchabl": 11, "sec": 18, "second": 18, "section": [12, 14, 15, 17, 18], "secur": [1, 13], "see": [1, 2], "seen": 18, "segment": [4, 8, 18], "self": 18, "semant": [4, 8], "send": 18, "sens": 10, "sensit": 16, "separ": 18, "sequenc": [4, 6, 7, 8, 10, 18], "sequenti": [9, 18], "seri": 1, "seriou": 1, "set": [1, 3, 6, 8, 10, 13, 15, 18], "set_global_polici": 17, "sever": [7, 9, 18], "sex": 1, "sexual": 1, "shade": 9, "shape": [4, 7, 8, 9, 10, 18], "share": [13, 16], "shift": 9, "shm": 13, "should": [2, 6, 7, 9, 10], "show": [4, 7, 8, 10, 12, 14, 15], "showcas": [2, 11], "shuffl": [6, 9], "side": 10, "signatur": 7, "signific": 16, "simpl": [4, 8, 17], "simpler": 8, "sinc": [6, 16], "singl": [1, 2, 4, 6], "single_img_doc": 17, "size": [1, 6, 7, 9, 15, 18], "skew": 18, "slack": 2, "slightli": 8, "small": [2, 8, 18], "smallest": 7, "snapshot_download": 8, "snippet": 18, "so": [2, 3, 6, 8, 14, 16], "social": 1, "socio": 1, "some": [3, 11, 14, 16], "someth": 2, "somewher": 2, "sort": 1, "sourc": [6, 7, 8, 9, 10, 14], "space": [1, 18], "span": 18, "spanish": 6, "spatial": [4, 6, 7], "specif": [2, 3, 10, 12, 16, 18], "specifi": [1, 6, 7], "speed": [4, 8, 18], "sphinx": 2, "sroie": [4, 6, 16], "stabl": 3, "stackoverflow": 2, "stage": 4, "standalon": 11, "standard": 9, "start": 6, "state": [4, 10, 15], "static": 10, "statu": 1, "std": [9, 12], "step": 13, "still": 18, "str": [6, 7, 8, 9, 10], "straight": [6, 8, 16, 18], "straighten": 18, "straighten_pag": [8, 12, 18], "straigten_pag": 12, "stream": 7, "street": [4, 6], "strict": 3, "strictli": 10, "string": [6, 7, 10, 18], "strive": 3, "strong": [4, 8], "structur": [17, 18], "subset": [6, 18], "suggest": [2, 14], "sum": 10, "summari": 10, "support": [3, 12, 15, 17, 18], "sustain": 1, "svhn": [4, 6, 16], "svt": [6, 16], "swedish": 6, "symmetr": [8, 9, 18], "symmetric_pad": [8, 9, 18], "synthet": 4, "synthtext": [4, 6, 16], "system": 18, "t": [2, 6, 12, 17, 18], "tabl": [14, 15, 16], "take": [1, 6, 18], "target": [6, 7, 9, 10, 16], "target_s": 6, "task": [4, 6, 8, 14, 16, 18], "task2": 6, "team": 3, "techminde": 3, "templat": [2, 4], "tensor": [6, 7, 9, 18], "tensorflow": [3, 4, 7, 8, 9, 12, 14, 17, 18], "tensorspec": 17, "term": 1, "test": [6, 16], "test_set": 6, "text": [6, 7, 8, 10, 16], "text_output": 18, "textmatch": 10, "textnet": 8, "textnet_bas": 8, "textnet_smal": 8, "textnet_tini": 8, "textract": [4, 18], "textstylebrush": [4, 6], "textual": [4, 6, 7, 8, 18], "tf": [3, 7, 8, 9, 14, 17], "than": [2, 10, 14], "thank": 2, "thei": [1, 10], "them": [6, 18], "thi": [1, 2, 3, 5, 6, 9, 10, 12, 13, 14, 16, 17, 18], "thing": [17, 18], "third": 3, "those": [1, 7, 18], "threaten": 1, "threshold": 18, "through": [1, 9, 15, 16], "tilman": 14, "time": [1, 4, 8, 10, 16], "tini": 8, "titl": [7, 18], "tm": 18, "tmp": 13, "togeth": [2, 7], "tograi": 9, "tool": 16, "top": [10, 17, 18], "topic": 2, "torch": [3, 9, 12, 14, 17], "torchvis": 9, "total": 12, "toward": [1, 3], "train": [2, 6, 8, 9, 14, 15, 16, 17, 18], "train_it": [6, 16], "train_load": [6, 16], "train_pytorch": 14, "train_set": [6, 16], "train_tensorflow": 14, "trainabl": [4, 8], "tranform": 9, "transcrib": 18, "transfer": [4, 6], "transfo": 9, "transform": [4, 6, 8], "translat": 1, "troll": 1, "true": [6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18], "truth": 10, "tune": 17, "tupl": [6, 7, 9, 10], "two": [7, 13], "txt": 6, "type": [7, 10, 14, 17, 18], "typic": 18, "u": [1, 2], "ucsd": 6, "udac": 2, "uint8": [7, 8, 10, 18], "ukrainian": 6, "unaccept": 1, "underli": [16, 18], "underneath": 7, "understand": [4, 6, 18], "uniform": [8, 9], "uniformli": 9, "uninterrupt": [7, 18], "union": 10, "unittest": 2, "unlock": 7, "unoffici": 8, "unprofession": 1, "unsolicit": 1, "unsupervis": 4, "unwelcom": 1, "up": [8, 18], "updat": 10, "upgrad": 2, "upper": [6, 9], "uppercas": 16, "url": 7, "us": [1, 2, 3, 6, 8, 10, 11, 12, 13, 14, 15, 18], "usabl": 18, "usag": [13, 17], "use_polygon": [6, 10, 16], "useabl": 18, "user": [4, 7, 11], "utf": 18, "util": 17, "v1": 14, "v3": [8, 14, 18], "valid": 16, "valu": [2, 7, 9, 18], "valuabl": 4, "variabl": 13, "varieti": 6, "veri": 8, "version": [1, 2, 3, 17, 18], "vgg": 8, "vgg16": 14, "vgg16_bn_r": 8, "via": 1, "vietnames": 6, "view": [4, 6], "viewpoint": 1, "violat": 1, "visibl": 1, "vision": [4, 6, 8], "visiondataset": 6, "visiontransform": 8, "visual": [3, 4, 15], "visualize_pag": 10, "vit_": 8, "vit_b": 8, "vitstr": [4, 8, 17], "vitstr_bas": [8, 18], "vitstr_smal": [8, 12, 17, 18], "viz": 3, "vocab": [12, 14, 16, 17, 18], "vocabulari": [6, 12, 14], "w": [7, 8, 9, 10], "w3": 18, "wa": 1, "wai": [1, 4, 16], "want": [2, 17, 18], "warmup": 18, "wasn": 2, "we": [1, 2, 3, 4, 7, 9, 12, 14, 16, 17, 18], "weasyprint": 7, "web": [2, 7], "websit": 6, "welcom": 1, "well": [1, 17], "were": [1, 7, 18], "what": 1, "when": [1, 2, 8], "whenev": 2, "where": [2, 7, 9, 10], "whether": [2, 6, 7, 9, 10, 16, 18], "which": [1, 8, 13, 15, 16, 18], "whichev": 3, "while": [9, 18], "why": 1, "width": [7, 9], "wiki": 1, "wildreceipt": [4, 6, 16], "window": [8, 10], "wish": 2, "within": 1, "without": [1, 6, 8], "wonder": 2, "word": [4, 6, 8, 10, 18], "word_1_1": 18, "word_1_2": 18, "word_1_3": 18, "wordgener": [6, 16], "words_onli": 10, "work": [12, 13, 18], "workflow": 2, "worklow": 2, "world": [10, 18], "worth": 8, "wrap": 18, "wrapper": [6, 9], "write": 13, "written": [1, 7], "www": [1, 7, 18], "x": [7, 9, 10], "x_ascend": 18, "x_descend": 18, "x_i": 10, "x_size": 18, "x_wconf": 18, "xhtml": 18, "xmax": 7, "xmin": 7, "xml": 18, "xml_bytes_str": 18, "xml_element": 18, "xml_output": 18, "xmln": 18, "y": 10, "y_i": 10, "y_j": 10, "yet": 15, "ymax": 7, "ymin": 7, "yolov8": 15, "you": [2, 3, 6, 7, 8, 12, 13, 14, 15, 16, 17, 18], "your": [2, 4, 7, 10, 18], "yoursit": 7, "zero": [9, 10], "zoo": 12, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 6, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 6, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 6, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 6, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 6, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 6, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 6, "\u00e4\u00f6\u00e4\u00f6": 6, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 6, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 6, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 6, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 6, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 6, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 6, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0905": 6, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 6, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 6, "\u0950": 6, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 6, "\u09bd": 6, "\u09ce": 6, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 6}, "titles": ["Changelog", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 2, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 1], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 1], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 1], "31": 0, "4": [0, 1], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 18, "approach": 18, "architectur": 18, "arg": [6, 7, 8, 9, 10], "artefact": 7, "artefactdetect": 15, "attribut": 1, "avail": [15, 16, 18], "aw": 13, "ban": 1, "block": 7, "bug": 2, "changelog": 0, "choos": [16, 18], "classif": [8, 12, 14], "code": [1, 2], "codebas": 2, "commit": 2, "commun": 14, "compos": 9, "conda": 3, "conduct": 1, "connect": 2, "continu": 2, "contrib": 5, "contribut": [2, 5, 15], "contributor": 1, "convent": 14, "correct": 1, "coven": 1, "custom": [6, 12], "data": 16, "dataload": 6, "dataset": [4, 6, 16], "detect": [4, 8, 14, 16, 18], "develop": 2, "do": 18, "doctr": [2, 4, 5, 6, 7, 8, 9, 10, 11], "document": [2, 4, 7], "end": 18, "enforc": 1, "evalu": 10, "export": 17, "factori": 8, "featur": [2, 4], "feedback": 2, "file": 7, "from": 14, "gener": [6, 16], "git": 3, "guidelin": 1, "half": 17, "hub": 14, "huggingfac": 14, "i": 18, "infer": 17, "instal": [2, 3], "integr": [2, 15], "io": 7, "lambda": 13, "let": 2, "line": 7, "linux": 3, "load": [12, 14, 16], "loader": 6, "main": 4, "mode": 2, "model": [4, 8, 12, 14, 17, 18], "modifi": 2, "modul": [5, 15], "name": 14, "notebook": 11, "object": 16, "ocr": [16, 18], "onli": 3, "onnx": 17, "optim": 17, "option": 18, "orient": 12, "our": 1, "output": 18, "own": [12, 16], "packag": 3, "page": 7, "perman": 1, "pipelin": 15, "pledg": 1, "precis": 17, "predictor": 18, "prepar": 17, "prerequisit": 3, "pretrain": 14, "push": 14, "python": 3, "qualiti": 2, "question": 2, "read": 7, "readi": 16, "recognit": [4, 8, 14, 16, 18], "report": 2, "request": 2, "respons": 1, "return": [6, 7, 8, 10], "right": 18, "scope": 1, "share": 14, "should": 18, "stage": 18, "standard": 1, "structur": [2, 7], "style": 2, "support": [4, 5, 6, 9], "synthet": [6, 16], "task": 10, "temporari": 1, "test": 2, "text": [4, 18], "train": 12, "transform": 9, "two": 18, "unit": 2, "us": [16, 17], "util": 10, "v0": 0, "verif": 2, "via": 3, "visual": 10, "vocab": 6, "warn": 1, "what": 18, "word": 7, "your": [12, 14, 15, 16, 17], "zoo": [4, 8]}})
\ No newline at end of file
diff --git a/using_doctr/custom_models_training.html b/using_doctr/custom_models_training.html
index 580b4368b7..e664c6a950 100644
--- a/using_doctr/custom_models_training.html
+++ b/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -615,7 +615,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/using_doctr/running_on_aws.html b/using_doctr/running_on_aws.html
index ddb0c3c80f..81c38b49f5 100644
--- a/using_doctr/running_on_aws.html
+++ b/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -358,7 +358,7 @@ AWS Lambda
-
+
diff --git a/using_doctr/sharing_models.html b/using_doctr/sharing_models.html
index 07a3b2f2a3..4f5d1d68a5 100644
--- a/using_doctr/sharing_models.html
+++ b/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -540,7 +540,7 @@ Recognition
-
+
diff --git a/using_doctr/using_contrib_modules.html b/using_doctr/using_contrib_modules.html
index b4a10925e6..cf282ff3a4 100644
--- a/using_doctr/using_contrib_modules.html
+++ b/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -411,7 +411,7 @@ ArtefactDetection
-
+
diff --git a/using_doctr/using_datasets.html b/using_doctr/using_datasets.html
index 4a52df36ba..e30b6d6459 100644
--- a/using_doctr/using_datasets.html
+++ b/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -638,7 +638,7 @@ Data Loading
-
+
diff --git a/using_doctr/using_model_export.html b/using_doctr/using_model_export.html
index 2b30ee63a1..ad9d09ed4c 100644
--- a/using_doctr/using_model_export.html
+++ b/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -463,7 +463,7 @@ Using your ONNX exported model
-
+
diff --git a/using_doctr/using_models.html b/using_doctr/using_models.html
index 13cb06116b..5c80dbf62d 100644
--- a/using_doctr/using_models.html
+++ b/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1249,7 +1249,7 @@ Advanced options
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/cord.html b/v0.1.0/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.0/_modules/doctr/datasets/cord.html
+++ b/v0.1.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/detection.html b/v0.1.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.0/_modules/doctr/datasets/detection.html
+++ b/v0.1.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/doc_artefacts.html b/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/funsd.html b/v0.1.0/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.0/_modules/doctr/datasets/funsd.html
+++ b/v0.1.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ic03.html b/v0.1.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.0/_modules/doctr/datasets/ic03.html
+++ b/v0.1.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ic13.html b/v0.1.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.0/_modules/doctr/datasets/ic13.html
+++ b/v0.1.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/iiit5k.html b/v0.1.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/iiithws.html b/v0.1.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/imgur5k.html b/v0.1.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/loader.html b/v0.1.0/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.0/_modules/doctr/datasets/loader.html
+++ b/v0.1.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/mjsynth.html b/v0.1.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ocr.html b/v0.1.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.0/_modules/doctr/datasets/ocr.html
+++ b/v0.1.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/recognition.html b/v0.1.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.0/_modules/doctr/datasets/recognition.html
+++ b/v0.1.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/sroie.html b/v0.1.0/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.0/_modules/doctr/datasets/sroie.html
+++ b/v0.1.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/svhn.html b/v0.1.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.0/_modules/doctr/datasets/svhn.html
+++ b/v0.1.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/svt.html b/v0.1.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.0/_modules/doctr/datasets/svt.html
+++ b/v0.1.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/synthtext.html b/v0.1.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/utils.html b/v0.1.0/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.0/_modules/doctr/datasets/utils.html
+++ b/v0.1.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/wildreceipt.html b/v0.1.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.0/_modules/doctr/io/elements.html b/v0.1.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.0/_modules/doctr/io/elements.html
+++ b/v0.1.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.0/_modules/doctr/io/html.html b/v0.1.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.0/_modules/doctr/io/html.html
+++ b/v0.1.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.0/_modules/doctr/io/image/base.html b/v0.1.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.0/_modules/doctr/io/image/base.html
+++ b/v0.1.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.0/_modules/doctr/io/image/tensorflow.html b/v0.1.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/io/pdf.html b/v0.1.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.0/_modules/doctr/io/pdf.html
+++ b/v0.1.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.0/_modules/doctr/io/reader.html b/v0.1.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.0/_modules/doctr/io/reader.html
+++ b/v0.1.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/zoo.html b/v0.1.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/zoo.html b/v0.1.0/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/models/factory/hub.html b/v0.1.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.0/_modules/doctr/models/factory/hub.html
+++ b/v0.1.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/zoo.html b/v0.1.0/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/models/zoo.html b/v0.1.0/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.0/_modules/doctr/models/zoo.html
+++ b/v0.1.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/transforms/modules/base.html b/v0.1.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/utils/metrics.html b/v0.1.0/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.0/_modules/doctr/utils/metrics.html
+++ b/v0.1.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.0/_modules/doctr/utils/visualization.html b/v0.1.0/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.0/_modules/doctr/utils/visualization.html
+++ b/v0.1.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.0/_modules/index.html b/v0.1.0/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.0/_modules/index.html
+++ b/v0.1.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.0/_sources/getting_started/installing.rst.txt b/v0.1.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.0/_sources/getting_started/installing.rst.txt
+++ b/v0.1.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.0/_static/basic.css b/v0.1.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.0/_static/basic.css
+++ b/v0.1.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.0/_static/doctools.js b/v0.1.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.0/_static/doctools.js
+++ b/v0.1.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.0/_static/language_data.js b/v0.1.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.0/_static/language_data.js
+++ b/v0.1.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.0/_static/searchtools.js b/v0.1.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.0/_static/searchtools.js
+++ b/v0.1.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.0/changelog.html b/v0.1.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.0/changelog.html
+++ b/v0.1.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.0/community/resources.html b/v0.1.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.0/community/resources.html
+++ b/v0.1.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.0/contributing/code_of_conduct.html b/v0.1.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.0/contributing/code_of_conduct.html
+++ b/v0.1.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.0/contributing/contributing.html b/v0.1.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.0/contributing/contributing.html
+++ b/v0.1.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.0/genindex.html b/v0.1.0/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.0/genindex.html
+++ b/v0.1.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.0/getting_started/installing.html b/v0.1.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.0/getting_started/installing.html
+++ b/v0.1.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.0/index.html b/v0.1.0/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.0/index.html
+++ b/v0.1.0/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.0/modules/contrib.html b/v0.1.0/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.0/modules/contrib.html
+++ b/v0.1.0/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.0/modules/datasets.html b/v0.1.0/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.0/modules/datasets.html
+++ b/v0.1.0/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.0/modules/io.html b/v0.1.0/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.0/modules/io.html
+++ b/v0.1.0/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.0/modules/models.html b/v0.1.0/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.0/modules/models.html
+++ b/v0.1.0/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.0/modules/transforms.html b/v0.1.0/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.0/modules/transforms.html
+++ b/v0.1.0/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.0/modules/utils.html b/v0.1.0/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.0/modules/utils.html
+++ b/v0.1.0/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.0/notebooks.html b/v0.1.0/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.0/notebooks.html
+++ b/v0.1.0/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.0/search.html b/v0.1.0/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.0/search.html
+++ b/v0.1.0/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.0/searchindex.js b/v0.1.0/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.0/searchindex.js
+++ b/v0.1.0/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.0/using_doctr/custom_models_training.html b/v0.1.0/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.0/using_doctr/custom_models_training.html
+++ b/v0.1.0/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.0/using_doctr/running_on_aws.html b/v0.1.0/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.0/using_doctr/running_on_aws.html
+++ b/v0.1.0/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.0/using_doctr/sharing_models.html b/v0.1.0/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.0/using_doctr/sharing_models.html
+++ b/v0.1.0/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.0/using_doctr/using_contrib_modules.html b/v0.1.0/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.0/using_doctr/using_contrib_modules.html
+++ b/v0.1.0/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.0/using_doctr/using_datasets.html b/v0.1.0/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.0/using_doctr/using_datasets.html
+++ b/v0.1.0/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.0/using_doctr/using_model_export.html b/v0.1.0/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.0/using_doctr/using_model_export.html
+++ b/v0.1.0/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.0/using_doctr/using_models.html b/v0.1.0/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.0/using_doctr/using_models.html
+++ b/v0.1.0/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/cord.html b/v0.1.1/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.1/_modules/doctr/datasets/cord.html
+++ b/v0.1.1/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/detection.html b/v0.1.1/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.1/_modules/doctr/datasets/detection.html
+++ b/v0.1.1/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/funsd.html b/v0.1.1/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.1/_modules/doctr/datasets/funsd.html
+++ b/v0.1.1/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic03.html b/v0.1.1/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.1/_modules/doctr/datasets/ic03.html
+++ b/v0.1.1/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic13.html b/v0.1.1/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.1/_modules/doctr/datasets/ic13.html
+++ b/v0.1.1/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiit5k.html b/v0.1.1/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.1/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.1/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiithws.html b/v0.1.1/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.1/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.1/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/imgur5k.html b/v0.1.1/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.1/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.1/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/loader.html b/v0.1.1/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.1/_modules/doctr/datasets/loader.html
+++ b/v0.1.1/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/mjsynth.html b/v0.1.1/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.1/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.1/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ocr.html b/v0.1.1/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.1/_modules/doctr/datasets/ocr.html
+++ b/v0.1.1/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/recognition.html b/v0.1.1/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.1/_modules/doctr/datasets/recognition.html
+++ b/v0.1.1/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/sroie.html b/v0.1.1/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.1/_modules/doctr/datasets/sroie.html
+++ b/v0.1.1/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svhn.html b/v0.1.1/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.1/_modules/doctr/datasets/svhn.html
+++ b/v0.1.1/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svt.html b/v0.1.1/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.1/_modules/doctr/datasets/svt.html
+++ b/v0.1.1/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/synthtext.html b/v0.1.1/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.1/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.1/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/utils.html b/v0.1.1/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.1/_modules/doctr/datasets/utils.html
+++ b/v0.1.1/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/wildreceipt.html b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.1/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.1/_modules/doctr/io/elements.html b/v0.1.1/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.1/_modules/doctr/io/elements.html
+++ b/v0.1.1/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.1/_modules/doctr/io/html.html b/v0.1.1/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.1/_modules/doctr/io/html.html
+++ b/v0.1.1/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/base.html b/v0.1.1/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.1/_modules/doctr/io/image/base.html
+++ b/v0.1.1/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/tensorflow.html b/v0.1.1/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.1/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.1/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/io/pdf.html b/v0.1.1/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.1/_modules/doctr/io/pdf.html
+++ b/v0.1.1/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.1/_modules/doctr/io/reader.html b/v0.1.1/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.1/_modules/doctr/io/reader.html
+++ b/v0.1.1/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/zoo.html b/v0.1.1/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.1/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.1/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/zoo.html b/v0.1.1/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.1/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.1/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/factory/hub.html b/v0.1.1/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.1/_modules/doctr/models/factory/hub.html
+++ b/v0.1.1/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/zoo.html b/v0.1.1/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.1/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.1/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/zoo.html b/v0.1.1/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.1/_modules/doctr/models/zoo.html
+++ b/v0.1.1/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/base.html b/v0.1.1/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/utils/metrics.html b/v0.1.1/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.1/_modules/doctr/utils/metrics.html
+++ b/v0.1.1/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.1/_modules/doctr/utils/visualization.html b/v0.1.1/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.1/_modules/doctr/utils/visualization.html
+++ b/v0.1.1/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.1/_modules/index.html b/v0.1.1/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.1/_modules/index.html
+++ b/v0.1.1/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.1/_sources/getting_started/installing.rst.txt b/v0.1.1/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.1/_sources/getting_started/installing.rst.txt
+++ b/v0.1.1/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.1/_static/basic.css b/v0.1.1/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.1/_static/basic.css
+++ b/v0.1.1/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.1/_static/doctools.js b/v0.1.1/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.1/_static/doctools.js
+++ b/v0.1.1/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.1/_static/language_data.js b/v0.1.1/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.1/_static/language_data.js
+++ b/v0.1.1/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.1/_static/searchtools.js b/v0.1.1/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.1/_static/searchtools.js
+++ b/v0.1.1/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.1/changelog.html b/v0.1.1/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.1/changelog.html
+++ b/v0.1.1/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.1/community/resources.html b/v0.1.1/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.1/community/resources.html
+++ b/v0.1.1/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.1/contributing/code_of_conduct.html b/v0.1.1/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.1/contributing/code_of_conduct.html
+++ b/v0.1.1/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.1/contributing/contributing.html b/v0.1.1/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.1/contributing/contributing.html
+++ b/v0.1.1/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.1/genindex.html b/v0.1.1/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.1/genindex.html
+++ b/v0.1.1/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.1/getting_started/installing.html b/v0.1.1/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.1/getting_started/installing.html
+++ b/v0.1.1/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.1/index.html b/v0.1.1/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.1/index.html
+++ b/v0.1.1/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.1/modules/contrib.html b/v0.1.1/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.1/modules/contrib.html
+++ b/v0.1.1/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.1/modules/datasets.html b/v0.1.1/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.1/modules/datasets.html
+++ b/v0.1.1/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.1/modules/io.html b/v0.1.1/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.1/modules/io.html
+++ b/v0.1.1/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.1/modules/models.html b/v0.1.1/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.1/modules/models.html
+++ b/v0.1.1/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.1/modules/transforms.html b/v0.1.1/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.1/modules/transforms.html
+++ b/v0.1.1/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
Using your ONNX exported model
-
+
diff --git a/latest/using_doctr/using_models.html b/latest/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/latest/using_doctr/using_models.html
+++ b/latest/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/modules/contrib.html b/modules/contrib.html
index 22b0c508a6..b8878635b6 100644
--- a/modules/contrib.html
+++ b/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -376,7 +376,7 @@ Supported contribution modules
-
+
diff --git a/modules/datasets.html b/modules/datasets.html
index 0fe4b78d48..dfcacbc96e 100644
--- a/modules/datasets.html
+++ b/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1077,7 +1077,7 @@ Returns:
-
+
diff --git a/modules/io.html b/modules/io.html
index 924d292c59..77e9e017bf 100644
--- a/modules/io.html
+++ b/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -756,7 +756,7 @@ Returns:¶
-
+
diff --git a/modules/models.html b/modules/models.html
index bf45d11a71..f4a9833365 100644
--- a/modules/models.html
+++ b/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1598,7 +1598,7 @@ Args:¶
-
+
diff --git a/modules/transforms.html b/modules/transforms.html
index 6d77d16e7b..bc254c867b 100644
--- a/modules/transforms.html
+++ b/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -831,7 +831,7 @@ Args:¶<
-
+
diff --git a/modules/utils.html b/modules/utils.html
index 3dd3ecbd96..6784d81f6f 100644
--- a/modules/utils.html
+++ b/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -711,7 +711,7 @@ Args:¶
-
+
diff --git a/notebooks.html b/notebooks.html
index f3ea994e49..647f73d4eb 100644
--- a/notebooks.html
+++ b/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -387,7 +387,7 @@ docTR Notebooks
-
+
diff --git a/search.html b/search.html
index f0693e2c97..0e0da5efb3 100644
--- a/search.html
+++ b/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -336,7 +336,7 @@
-
+
diff --git a/searchindex.js b/searchindex.js
index 8598997441..df18967072 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[1, "correction"]], "2. Warning": [[1, "warning"]], "3. Temporary Ban": [[1, "temporary-ban"]], "4. Permanent Ban": [[1, "permanent-ban"]], "AWS Lambda": [[13, null]], "Advanced options": [[18, "advanced-options"]], "Args:": [[6, "args"], [6, "id4"], [6, "id7"], [6, "id10"], [6, "id13"], [6, "id16"], [6, "id19"], [6, "id22"], [6, "id25"], [6, "id29"], [6, "id32"], [6, "id37"], [6, "id40"], [6, "id46"], [6, "id49"], [6, "id50"], [6, "id51"], [6, "id54"], [6, "id57"], [6, "id60"], [6, "id61"], [7, "args"], [7, "id2"], [7, "id3"], [7, "id4"], [7, "id5"], [7, "id6"], [7, "id7"], [7, "id10"], [7, "id12"], [7, "id14"], [7, "id16"], [7, "id20"], [7, "id24"], [7, "id28"], [8, "args"], [8, "id3"], [8, "id8"], [8, "id13"], [8, "id17"], [8, "id21"], [8, "id26"], [8, "id31"], [8, "id36"], [8, "id41"], [8, "id46"], [8, "id50"], [8, "id54"], [8, "id59"], [8, "id63"], [8, "id68"], [8, "id73"], [8, "id77"], [8, "id81"], [8, "id85"], [8, "id90"], [8, "id95"], [8, "id99"], [8, "id104"], [8, "id109"], [8, "id114"], [8, "id119"], [8, "id123"], [8, "id127"], [8, "id132"], [8, "id137"], [8, "id142"], [8, "id146"], [8, "id150"], [8, "id155"], [8, "id159"], [8, "id163"], [8, "id167"], [8, "id169"], [8, "id171"], [8, "id173"], [9, "args"], [9, "id1"], [9, "id2"], [9, "id3"], [9, "id4"], [9, "id5"], [9, "id6"], [9, "id7"], [9, "id8"], [9, "id9"], [9, "id10"], [9, "id11"], [9, "id12"], [9, "id13"], [9, "id14"], [9, "id15"], [9, "id16"], [9, "id17"], [9, "id18"], [9, "id19"], [10, "args"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"]], "Artefact": [[7, "artefact"]], "ArtefactDetection": [[15, "artefactdetection"]], "Attribution": [[1, "attribution"]], "Available Datasets": [[16, "available-datasets"]], "Available architectures": [[18, "available-architectures"], [18, "id1"], [18, "id2"]], "Available contribution modules": [[15, "available-contribution-modules"]], "Block": [[7, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[16, null]], "Choosing the right model": [[18, null]], "Classification": [[14, "classification"]], "Code quality": [[2, "code-quality"]], "Code style verification": [[2, "code-style-verification"]], "Codebase structure": [[2, "codebase-structure"]], "Commits": [[2, "commits"]], "Composing transformations": [[9, "composing-transformations"]], "Continuous Integration": [[2, "continuous-integration"]], "Contributing to docTR": [[2, null]], "Contributor Covenant Code of Conduct": [[1, null]], "Custom dataset loader": [[6, "custom-dataset-loader"]], "Custom orientation classification models": [[12, "custom-orientation-classification-models"]], "Data Loading": [[16, "data-loading"]], "Dataloader": [[6, "dataloader"]], "Detection": [[14, "detection"], [16, "detection"]], "Detection predictors": [[18, "detection-predictors"]], "Developer mode installation": [[2, "developer-mode-installation"]], "Developing docTR": [[2, "developing-doctr"]], "Document": [[7, "document"]], "Document structure": [[7, "document-structure"]], "End-to-End OCR": [[18, "end-to-end-ocr"]], "Enforcement": [[1, "enforcement"]], "Enforcement Guidelines": [[1, "enforcement-guidelines"]], "Enforcement Responsibilities": [[1, "enforcement-responsibilities"]], "Export to ONNX": [[17, "export-to-onnx"]], "Feature requests & bug report": [[2, "feature-requests-bug-report"]], "Feedback": [[2, "feedback"]], "File reading": [[7, "file-reading"]], "Half-precision": [[17, "half-precision"]], "Installation": [[3, null]], "Integrate contributions into your pipeline": [[15, null]], "Let\u2019s connect": [[2, "let-s-connect"]], "Line": [[7, "line"]], "Loading from Huggingface Hub": [[14, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[12, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[12, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[4, "main-features"]], "Model optimization": [[17, "model-optimization"]], "Model zoo": [[4, "model-zoo"]], "Modifying the documentation": [[2, "modifying-the-documentation"]], "Naming conventions": [[14, "naming-conventions"]], "OCR": [[16, "ocr"]], "Object Detection": [[16, "object-detection"]], "Our Pledge": [[1, "our-pledge"]], "Our Standards": [[1, "our-standards"]], "Page": [[7, "page"]], "Preparing your model for inference": [[17, null]], "Prerequisites": [[3, "prerequisites"]], "Pretrained community models": [[14, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[14, "pushing-to-the-huggingface-hub"]], "Questions": [[2, "questions"]], "Recognition": [[14, "recognition"], [16, "recognition"]], "Recognition predictors": [[18, "recognition-predictors"]], "Returns:": [[6, "returns"], [7, "returns"], [7, "id11"], [7, "id13"], [7, "id15"], [7, "id19"], [7, "id23"], [7, "id27"], [7, "id31"], [8, "returns"], [8, "id6"], [8, "id11"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id29"], [8, "id34"], [8, "id39"], [8, "id44"], [8, "id49"], [8, "id53"], [8, "id57"], [8, "id62"], [8, "id66"], [8, "id71"], [8, "id76"], [8, "id80"], [8, "id84"], [8, "id88"], [8, "id93"], [8, "id98"], [8, "id102"], [8, "id107"], [8, "id112"], [8, "id117"], [8, "id122"], [8, "id126"], [8, "id130"], [8, "id135"], [8, "id140"], [8, "id145"], [8, "id149"], [8, "id153"], [8, "id158"], [8, "id162"], [8, "id166"], [8, "id168"], [8, "id170"], [8, "id172"], [10, "returns"]], "Scope": [[1, "scope"]], "Share your model with the community": [[14, null]], "Supported Vocabs": [[6, "supported-vocabs"]], "Supported contribution modules": [[5, "supported-contribution-modules"]], "Supported datasets": [[4, "supported-datasets"]], "Supported transformations": [[9, "supported-transformations"]], "Synthetic dataset generator": [[6, "synthetic-dataset-generator"], [16, "synthetic-dataset-generator"]], "Task evaluation": [[10, "task-evaluation"]], "Text Detection": [[18, "text-detection"]], "Text Recognition": [[18, "text-recognition"]], "Text detection models": [[4, "text-detection-models"]], "Text recognition models": [[4, "text-recognition-models"]], "Train your own model": [[12, null]], "Two-stage approaches": [[18, "two-stage-approaches"]], "Unit tests": [[2, "unit-tests"]], "Use your own datasets": [[16, "use-your-own-datasets"]], "Using your ONNX exported model": [[17, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[3, "via-conda-only-for-linux"]], "Via Git": [[3, "via-git"]], "Via Python Package": [[3, "via-python-package"]], "Visualization": [[10, "visualization"]], "What should I do with the output?": [[18, "what-should-i-do-with-the-output"]], "Word": [[7, "word"]], "docTR Notebooks": [[11, null]], "docTR Vocabs": [[6, "id62"]], "docTR: Document Text Recognition": [[4, null]], "doctr.contrib": [[5, null]], "doctr.datasets": [[6, null], [6, "datasets"]], "doctr.io": [[7, null]], "doctr.models": [[8, null]], "doctr.models.classification": [[8, "doctr-models-classification"]], "doctr.models.detection": [[8, "doctr-models-detection"]], "doctr.models.factory": [[8, "doctr-models-factory"]], "doctr.models.recognition": [[8, "doctr-models-recognition"]], "doctr.models.zoo": [[8, "doctr-models-zoo"]], "doctr.transforms": [[9, null]], "doctr.utils": [[10, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[7, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[7, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[9, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[6, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[9, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[9, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[6, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[6, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[8, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[6, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[6, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[7, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[7, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[6, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[6, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[9, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[9, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[6, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[6, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[6, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[6, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[6, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[8, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[9, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[7, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[6, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[9, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[8, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[6, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[9, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[7, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[9, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[9, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[9, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[9, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[9, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[9, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[9, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[9, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[9, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[9, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[9, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[9, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[7, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[7, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[7, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[6, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[9, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[7, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[7, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[6, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[6, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[6, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[6, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[9, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[10, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[6, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[7, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[6, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[6, 0, 1, "", "CORD"], [6, 0, 1, "", "CharacterGenerator"], [6, 0, 1, "", "DetectionDataset"], [6, 0, 1, "", "DocArtefacts"], [6, 0, 1, "", "FUNSD"], [6, 0, 1, "", "IC03"], [6, 0, 1, "", "IC13"], [6, 0, 1, "", "IIIT5K"], [6, 0, 1, "", "IIITHWS"], [6, 0, 1, "", "IMGUR5K"], [6, 0, 1, "", "MJSynth"], [6, 0, 1, "", "OCRDataset"], [6, 0, 1, "", "RecognitionDataset"], [6, 0, 1, "", "SROIE"], [6, 0, 1, "", "SVHN"], [6, 0, 1, "", "SVT"], [6, 0, 1, "", "SynthText"], [6, 0, 1, "", "WILDRECEIPT"], [6, 0, 1, "", "WordGenerator"], [6, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[6, 0, 1, "", "DataLoader"]], "doctr.io": [[7, 0, 1, "", "Artefact"], [7, 0, 1, "", "Block"], [7, 0, 1, "", "Document"], [7, 0, 1, "", "DocumentFile"], [7, 0, 1, "", "Line"], [7, 0, 1, "", "Page"], [7, 0, 1, "", "Word"], [7, 1, 1, "", "decode_img_as_tensor"], [7, 1, 1, "", "read_html"], [7, 1, 1, "", "read_img_as_numpy"], [7, 1, 1, "", "read_img_as_tensor"], [7, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[7, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[7, 2, 1, "", "from_images"], [7, 2, 1, "", "from_pdf"], [7, 2, 1, "", "from_url"]], "doctr.io.Page": [[7, 2, 1, "", "show"]], "doctr.models": [[8, 1, 1, "", "kie_predictor"], [8, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[8, 1, 1, "", "crop_orientation_predictor"], [8, 1, 1, "", "magc_resnet31"], [8, 1, 1, "", "mobilenet_v3_large"], [8, 1, 1, "", "mobilenet_v3_large_r"], [8, 1, 1, "", "mobilenet_v3_small"], [8, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [8, 1, 1, "", "mobilenet_v3_small_page_orientation"], [8, 1, 1, "", "mobilenet_v3_small_r"], [8, 1, 1, "", "page_orientation_predictor"], [8, 1, 1, "", "resnet18"], [8, 1, 1, "", "resnet31"], [8, 1, 1, "", "resnet34"], [8, 1, 1, "", "resnet50"], [8, 1, 1, "", "textnet_base"], [8, 1, 1, "", "textnet_small"], [8, 1, 1, "", "textnet_tiny"], [8, 1, 1, "", "vgg16_bn_r"], [8, 1, 1, "", "vit_b"], [8, 1, 1, "", "vit_s"]], "doctr.models.detection": [[8, 1, 1, "", "db_mobilenet_v3_large"], [8, 1, 1, "", "db_resnet50"], [8, 1, 1, "", "detection_predictor"], [8, 1, 1, "", "fast_base"], [8, 1, 1, "", "fast_small"], [8, 1, 1, "", "fast_tiny"], [8, 1, 1, "", "linknet_resnet18"], [8, 1, 1, "", "linknet_resnet34"], [8, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[8, 1, 1, "", "from_hub"], [8, 1, 1, "", "login_to_hub"], [8, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[8, 1, 1, "", "crnn_mobilenet_v3_large"], [8, 1, 1, "", "crnn_mobilenet_v3_small"], [8, 1, 1, "", "crnn_vgg16_bn"], [8, 1, 1, "", "master"], [8, 1, 1, "", "parseq"], [8, 1, 1, "", "recognition_predictor"], [8, 1, 1, "", "sar_resnet31"], [8, 1, 1, "", "vitstr_base"], [8, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[9, 0, 1, "", "ChannelShuffle"], [9, 0, 1, "", "ColorInversion"], [9, 0, 1, "", "Compose"], [9, 0, 1, "", "GaussianBlur"], [9, 0, 1, "", "GaussianNoise"], [9, 0, 1, "", "LambdaTransformation"], [9, 0, 1, "", "Normalize"], [9, 0, 1, "", "OneOf"], [9, 0, 1, "", "RandomApply"], [9, 0, 1, "", "RandomBrightness"], [9, 0, 1, "", "RandomContrast"], [9, 0, 1, "", "RandomCrop"], [9, 0, 1, "", "RandomGamma"], [9, 0, 1, "", "RandomHorizontalFlip"], [9, 0, 1, "", "RandomHue"], [9, 0, 1, "", "RandomJpegQuality"], [9, 0, 1, "", "RandomResize"], [9, 0, 1, "", "RandomRotate"], [9, 0, 1, "", "RandomSaturation"], [9, 0, 1, "", "RandomShadow"], [9, 0, 1, "", "Resize"], [9, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[10, 0, 1, "", "DetectionMetric"], [10, 0, 1, "", "LocalizationConfusion"], [10, 0, 1, "", "OCRMetric"], [10, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.visualization": [[10, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [1, 7, 8, 10, 14, 17], "0": [1, 3, 6, 9, 10, 12, 15, 16, 18], "00": 18, "01": 18, "0123456789": 6, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "02562": 8, "03": 18, "035": 18, "0361328125": 18, "04": 18, "05": 18, "06": 18, "06640625": 18, "07": 18, "08": [9, 18], "09": 18, "0966796875": 18, "1": [6, 7, 8, 9, 10, 12, 16, 18], "10": [6, 10, 18], "100": [6, 9, 10, 16, 18], "1000": 18, "101": 6, "1024": [8, 12, 18], "104": 6, "106": 6, "108": 6, "1095": 16, "11": 18, "110": 10, "1107": 16, "114": 6, "115": 6, "1156": 16, "116": 6, "118": 6, "11800h": 18, "11th": 18, "12": 18, "120": 6, "123": 6, "126": 6, "1268": 16, "128": [8, 12, 17, 18], "13": 18, "130": 6, "13068": 16, "131": 6, "1337891": 16, "1357421875": 18, "1396484375": 18, "14": 18, "1420": 18, "14470v1": 6, "149": 16, "15": 18, "150": [10, 18], "1552": 18, "16": [8, 17, 18], "1630859375": 18, "1684": 18, "16x16": 8, "17": 18, "1778": 18, "1782": 18, "18": [8, 18], "185546875": 18, "1900": 18, "1910": 8, "19342": 16, "19370": 16, "195": 6, "19598": 16, "199": 18, "1999": 18, "2": [3, 4, 6, 7, 9, 15, 18], "20": 18, "200": 10, "2000": 16, "2003": [4, 6], "2012": 6, "2013": [4, 6], "2015": 6, "2019": 4, "207901": 16, "21": 18, "2103": 6, "2186": 16, "21888": 16, "22": 18, "224": [8, 9], "225": 9, "22672": 16, "229": [9, 16], "23": 18, "233": 16, "236": 6, "24": 18, "246": 16, "249": 16, "25": 18, "2504": 18, "255": [7, 8, 9, 10, 18], "256": 8, "257": 16, "26": 18, "26032": 16, "264": 12, "27": 18, "2700": 16, "2710": 18, "2749": 12, "28": 18, "287": 12, "29": 18, "296": 12, "299": 12, "2d": 18, "3": [3, 4, 7, 8, 9, 10, 17, 18], "30": 18, "300": 16, "3000": 16, "301": 12, "30595": 18, "30ghz": 18, "31": 8, "32": [6, 8, 9, 12, 16, 17, 18], "3232421875": 18, "33": [9, 18], "33402": 16, "33608": 16, "34": [8, 18], "340": 18, "3456": 18, "3515625": 18, "36": 18, "360": 16, "37": [6, 18], "38": 18, "39": 18, "4": [8, 9, 10, 18], "40": 18, "406": 9, "41": 18, "42": 18, "43": 18, "44": 18, "45": 18, "456": 9, "46": 18, "47": 18, "472": 16, "48": [6, 18], "485": 9, "49": 18, "49377": 16, "5": [6, 9, 10, 15, 18], "50": [8, 16, 18], "51": 18, "51171875": 18, "512": 8, "52": [6, 18], "529": 18, "53": 18, "54": 18, "540": 18, "5478515625": 18, "55": 18, "56": 18, "57": 18, "58": [6, 18], "580": 18, "5810546875": 18, "583": 18, "59": 18, "597": 18, "5k": [4, 6], "5m": 18, "6": [9, 18], "60": 9, "600": [8, 10, 18], "61": 18, "62": 18, "626": 16, "63": 18, "64": [8, 9, 18], "641": 18, "647": 16, "65": 18, "66": 18, "67": 18, "68": 18, "69": 18, "693": 12, "694": 12, "695": 12, "6m": 18, "7": 18, "70": [6, 10, 18], "707470": 16, "71": [6, 18], "7100000": 16, "7141797": 16, "7149": 16, "72": 18, "72dpi": 7, "73": 18, "73257": 16, "74": 18, "75": [9, 18], "7581382": 16, "76": 18, "77": 18, "772": 12, "772875": 16, "78": 18, "785": 12, "79": 18, "793533": 16, "796": 16, "798": 12, "7m": 18, "8": [8, 9, 18], "80": 18, "800": [8, 10, 16, 18], "81": 18, "82": 18, "83": 18, "84": 18, "849": 16, "85": 18, "8564453125": 18, "857": 18, "85875": 16, "86": 18, "8603515625": 18, "87": 18, "8707": 16, "88": 18, "89": 18, "9": [3, 9, 18], "90": 18, "90k": 6, "90kdict32px": 6, "91": 18, "914085328578949": 18, "92": 18, "93": 18, "94": [6, 18], "95": [10, 18], "9578408598899841": 18, "96": 18, "97": 18, "98": 18, "99": 18, "9949972033500671": 18, "A": [1, 2, 4, 6, 7, 8, 11, 17], "As": 2, "Be": 18, "Being": 1, "By": 13, "For": [1, 2, 3, 12, 18], "If": [2, 7, 8, 12, 18], "In": [2, 6, 16], "It": [9, 14, 15, 17], "Its": [4, 8], "No": [1, 18], "Of": 6, "Or": [15, 17], "The": [1, 2, 6, 7, 10, 13, 15, 16, 17, 18], "Then": 8, "To": [2, 3, 13, 14, 15, 17, 18], "_": [1, 6, 8], "__call__": 18, "_build": 2, "_i": 10, "ab": 6, "abc": 17, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "abdef": [6, 16], "abl": [16, 18], "about": [1, 16, 18], "abov": 18, "abstractdataset": 6, "abus": 1, "accept": 1, "access": [4, 7, 16, 18], "account": [1, 14], "accur": 18, "accuraci": 10, "achiev": 17, "act": 1, "action": 1, "activ": 4, "ad": [2, 8, 9], "adapt": 1, "add": [9, 10, 14, 18], "add_hook": 18, "add_label": 10, "addit": [2, 3, 7, 15, 18], "addition": [2, 18], "address": [1, 7], "adjust": 9, "advanc": 1, "advantag": 17, "advis": 2, "aesthet": [4, 6], "affect": 1, "after": [14, 18], "ag": 1, "again": 8, "aggreg": [10, 16], "aggress": 1, "align": [1, 7, 9], "all": [1, 2, 5, 6, 7, 9, 10, 15, 16, 18], "allow": [1, 17], "along": 18, "alreadi": [2, 17], "also": [1, 8, 14, 15, 16, 18], "alwai": 16, "an": [1, 2, 4, 6, 7, 8, 10, 15, 17, 18], "analysi": [7, 15], "ancient_greek": 6, "angl": [7, 9], "ani": [1, 6, 7, 8, 9, 10, 17, 18], "annot": 6, "anot": 16, "anoth": [8, 12, 16], "answer": 1, "anyascii": 10, "anyon": 4, "anyth": 15, "api": [2, 4], "apolog": 1, "apologi": 1, "app": 2, "appear": 1, "appli": [1, 6, 9], "applic": [4, 8], "appoint": 1, "appreci": 14, "appropri": [1, 2, 18], "ar": [1, 2, 3, 5, 6, 7, 9, 10, 11, 15, 16, 18], "arab": 6, "arabic_diacrit": 6, "arabic_lett": 6, "arabic_punctu": 6, "arbitrarili": [4, 8], "arch": [8, 14], "architectur": [4, 8, 14, 15], "area": 18, "argument": [6, 7, 8, 10, 12, 18], "around": 1, "arrai": [7, 9, 10], "art": [4, 15], "artefact": [10, 15, 18], "artefact_typ": 7, "artifici": [4, 6], "arxiv": [6, 8], "asarrai": 10, "ascii_lett": 6, "aspect": [4, 8, 9, 18], "assess": 10, "assign": 10, "associ": 7, "assum": 8, "assume_straight_pag": [8, 12, 18], "astyp": [8, 10, 18], "attack": 1, "attend": [4, 8], "attent": [1, 8], "autom": 4, "automat": 18, "autoregress": [4, 8], "avail": [1, 4, 5, 9], "averag": [9, 18], "avoid": [1, 3], "aw": [4, 18], "awar": 18, "azur": 18, "b": [8, 10, 18], "b_j": 10, "back": 2, "backbon": 8, "backend": 18, "background": 16, "bangla": 6, "bar": 15, "bar_cod": 16, "base": [4, 8, 15], "baselin": [4, 8, 18], "batch": [6, 8, 9, 15, 16, 18], "batch_siz": [6, 12, 15, 16, 17], "bblanchon": 3, "bbox": 18, "becaus": 13, "been": [2, 10, 16, 18], "befor": [6, 8, 9, 18], "begin": 10, "behavior": [1, 18], "being": [10, 18], "belong": 18, "benchmark": 18, "best": 1, "better": [11, 18], "between": [9, 10, 18], "bgr": 7, "bilinear": 9, "bin_thresh": 18, "binar": [4, 8, 18], "binari": [7, 17, 18], "bit": 17, "block": [10, 18], "block_1_1": 18, "blur": 9, "bmvc": 6, "bn": 14, "bodi": [1, 18], "bool": [6, 7, 8, 9, 10], "boolean": [8, 18], "both": [4, 6, 9, 16, 18], "bottom": [8, 18], "bound": [6, 7, 8, 9, 10, 15, 16, 18], "box": [6, 7, 8, 9, 10, 15, 16, 18], "box_thresh": 18, "bright": 9, "browser": [2, 4], "build": [2, 3, 17], "built": 2, "byte": [7, 18], "c": [3, 7, 10], "c_j": 10, "cach": [2, 6, 13], "cache_sampl": 6, "call": 17, "callabl": [6, 9], "can": [2, 3, 12, 13, 14, 15, 16, 18], "capabl": [2, 11, 18], "case": [6, 10], "cf": 18, "cfg": 18, "challeng": 6, "challenge2_test_task12_imag": 6, "challenge2_test_task1_gt": 6, "challenge2_training_task12_imag": 6, "challenge2_training_task1_gt": 6, "chang": [13, 18], "channel": [1, 2, 7, 9], "channel_prior": 3, "channelshuffl": 9, "charact": [4, 6, 7, 10, 16, 18], "charactergener": [6, 16], "characterist": 1, "charg": 18, "charset": 18, "chart": 7, "check": [2, 14, 18], "checkpoint": 8, "chip": 3, "ci": 2, "clarifi": 1, "clariti": 1, "class": [1, 6, 7, 9, 10, 18], "class_nam": 12, "classif": [16, 18], "classmethod": 7, "clear": 2, "clone": 3, "close": 2, "co": 14, "code": [4, 7, 15], "codecov": 2, "colab": 11, "collate_fn": 6, "collect": [7, 15], "color": 9, "colorinvers": 9, "column": 7, "com": [1, 3, 7, 8, 14], "combin": 18, "command": [2, 15], "comment": 1, "commit": 1, "common": [1, 9, 10, 17], "commun": 1, "compar": 4, "comparison": [10, 18], "competit": 6, "compil": [11, 18], "complaint": 1, "complementari": 10, "complet": 2, "compon": 18, "compos": [6, 18], "comprehens": 18, "comput": [6, 10, 17, 18], "conf_threshold": 15, "confid": [7, 18], "config": [3, 8], "configur": 8, "confus": 10, "consecut": [9, 18], "consequ": 1, "consid": [1, 2, 6, 7, 10, 18], "consist": 18, "consolid": [4, 6], "constant": 9, "construct": 1, "contact": 1, "contain": [5, 6, 11, 16, 18], "content": [6, 7, 18], "context": 8, "contib": 3, "continu": 1, "contrast": 9, "contrast_factor": 9, "contrib": [3, 15], "contribut": 1, "contributor": 2, "convers": 7, "convert": [7, 9], "convolut": 8, "coordin": [7, 18], "cord": [4, 6, 16, 18], "core": [10, 18], "corner": 18, "correct": 9, "correspond": [3, 7, 9, 18], "could": [1, 15], "counterpart": 10, "cover": 2, "coverag": 2, "cpu": [4, 12, 17], "creat": 14, "crnn": [4, 8, 14], "crnn_mobilenet_v3_larg": [8, 14, 18], "crnn_mobilenet_v3_smal": [8, 17, 18], "crnn_vgg16_bn": [8, 12, 14, 18], "crop": [7, 8, 9, 12, 16, 18], "crop_orient": [7, 18], "crop_orientation_predictor": [8, 12], "crop_param": 12, "cuda": 17, "currenc": 6, "current": [2, 12, 18], "custom": [14, 15, 17, 18], "custom_crop_orientation_model": 12, "custom_page_orientation_model": 12, "customhook": 18, "cvit": 4, "czczup": 8, "czech": 6, "d": [6, 16], "danish": 6, "data": [4, 6, 7, 9, 10, 12, 14], "dataload": 16, "dataset": [8, 12, 18], "dataset_info": 6, "date": [12, 18], "db": 14, "db_mobilenet_v3_larg": [8, 14, 18], "db_resnet34": 18, "db_resnet50": [8, 12, 14, 18], "dbnet": [4, 8], "deal": [11, 18], "decis": 1, "decod": 7, "decode_img_as_tensor": 7, "dedic": 17, "deem": 1, "deep": [8, 18], "def": 18, "default": [3, 7, 12, 13, 18], "defer": 16, "defin": [10, 17], "degre": [7, 9, 18], "degress": 7, "delet": 2, "delimit": 18, "delta": 9, "demo": [2, 4], "demonstr": 1, "depend": [2, 3, 4, 18], "deploi": 2, "deploy": 4, "derogatori": 1, "describ": 8, "descript": 11, "design": 9, "desir": 7, "det_arch": [8, 12, 14, 17], "det_b": 18, "det_model": [12, 14, 17], "det_param": 12, "det_predictor": [12, 18], "detail": [12, 18], "detect": [6, 7, 10, 11, 12, 15], "detect_languag": 8, "detect_orient": [8, 12, 18], "detection_predictor": [8, 18], "detection_task": [6, 16], "detectiondataset": [6, 16], "detectionmetr": 10, "detectionpredictor": [8, 12], "detector": [4, 8, 15], "deterior": 8, "determin": 1, "dev": [2, 13], "develop": 3, "deviat": 9, "devic": 17, "dict": [7, 10, 18], "dictionari": [7, 10], "differ": 1, "differenti": [4, 8], "digit": [4, 6, 16], "dimens": [7, 10, 18], "dimension": 9, "direct": 6, "directli": [14, 18], "directori": [2, 13], "disabl": [1, 13, 18], "disable_crop_orient": 18, "disable_page_orient": 18, "disclaim": 18, "discuss": 2, "disparag": 1, "displai": [7, 10], "display_artefact": 10, "distribut": 9, "div": 18, "divers": 1, "divid": 7, "do": [2, 3, 8], "doc": [2, 7, 15, 17, 18], "docartefact": [6, 16], "docstr": 2, "doctr": [3, 12, 13, 14, 15, 16, 17, 18], "doctr_cache_dir": 13, "doctr_multiprocessing_dis": 13, "document": [6, 8, 10, 11, 12, 15, 16, 17, 18], "documentbuild": 18, "documentfil": [7, 12, 14, 15, 17], "doesn": 17, "don": [12, 18], "done": 9, "download": [6, 16], "downsiz": 8, "draw": 9, "drop": 6, "drop_last": 6, "dtype": [7, 8, 9, 10, 17], "dual": [4, 6], "dummi": 14, "dummy_img": 18, "dummy_input": 17, "dure": 1, "dutch": 6, "dynam": [6, 15], "dynamic_seq_length": 6, "e": [1, 2, 3, 7, 8], "each": [4, 6, 7, 8, 9, 10, 16, 18], "eas": 2, "easi": [4, 10, 14, 17], "easili": [7, 10, 12, 14, 16, 18], "econom": 1, "edit": 1, "educ": 1, "effect": 18, "effici": [2, 4, 6, 8], "either": [10, 18], "element": [6, 7, 8, 18], "els": [2, 15], "email": 1, "empathi": 1, "en": 18, "enabl": [6, 7], "enclos": 7, "encod": [4, 6, 7, 8, 18], "encode_sequ": 6, "encount": 2, "encrypt": 7, "end": [4, 6, 8, 10], "english": [6, 16], "enough": [2, 18], "ensur": 2, "entri": 6, "environ": [1, 13], "eo": 6, "equiv": 18, "estim": 8, "etc": [7, 15], "ethnic": 1, "evalu": [16, 18], "event": 1, "everyon": 1, "everyth": [2, 18], "exact": [10, 18], "exampl": [1, 2, 4, 6, 8, 14, 18], "exchang": 17, "execut": 18, "exist": 14, "expand": 9, "expect": [7, 9, 10], "experi": 1, "explan": [1, 18], "explicit": 1, "exploit": [4, 8], "export": [7, 8, 10, 11, 15, 18], "export_as_straight_box": [8, 18], "export_as_xml": 18, "export_model_to_onnx": 17, "express": [1, 9], "extens": 7, "extern": [1, 16], "extract": [4, 6], "extractor": 8, "f_": 10, "f_a": 10, "factor": 9, "fair": 1, "fairli": 1, "fals": [6, 7, 8, 9, 10, 12, 18], "faq": 1, "fascan": 14, "fast": [4, 6, 8], "fast_bas": [8, 18], "fast_smal": [8, 18], "fast_tini": [8, 18], "faster": [4, 8, 17], "fasterrcnn_mobilenet_v3_large_fpn": 8, "favorit": 18, "featur": [3, 8, 10, 11, 12, 15], "feedback": 1, "feel": [2, 14], "felix92": 14, "few": [17, 18], "figsiz": 10, "figur": [10, 15], "file": [2, 6], "final": 8, "find": [2, 16], "finnish": 6, "first": [2, 6], "firsthand": 6, "fit": [8, 18], "flag": 18, "flip": 9, "float": [7, 9, 10, 17], "float32": [7, 8, 9, 17], "fn": 9, "focu": 14, "focus": [1, 6], "folder": 6, "follow": [1, 2, 3, 6, 9, 10, 12, 13, 14, 15, 18], "font": 6, "font_famili": 6, "foral": 10, "forc": 2, "forg": 3, "form": [4, 6, 18], "format": [7, 10, 12, 16, 17, 18], "forpost": [4, 6], "forum": 2, "fp16": 17, "frac": 10, "framework": [3, 14, 16, 18], "free": [1, 2, 14], "french": [6, 12, 14, 18], "friendli": 4, "from": [1, 4, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18], "from_hub": [8, 14], "from_imag": [7, 14, 15, 17], "from_pdf": 7, "from_url": 7, "full": [6, 10, 18], "function": [6, 9, 10, 15], "funsd": [4, 6, 16, 18], "further": 16, "futur": 6, "g": [7, 8], "g_": 10, "g_x": 10, "gamma": 9, "gaussian": 9, "gaussianblur": 9, "gaussiannois": 9, "gen": 18, "gender": 1, "gener": [2, 4, 7, 8], "generic_cyrillic_lett": 6, "geometri": [4, 7, 18], "geq": 10, "german": [6, 12, 14], "get": [17, 18], "git": 14, "github": [2, 3, 8, 14], "give": [1, 15], "given": [6, 7, 9, 10, 18], "global": 8, "go": 18, "good": 17, "googl": 2, "googlevis": 4, "gpu": [4, 15, 17], "gracefulli": 1, "graph": [4, 6, 7], "grayscal": 9, "ground": 10, "groung": 10, "group": [4, 18], "gt": 10, "gt_box": 10, "gt_label": 10, "guid": 2, "guidanc": 16, "gvision": 18, "h": [7, 8, 9], "h_": 10, "ha": [2, 6, 10, 16], "handl": [11, 16, 18], "handwrit": 6, "handwritten": 16, "harass": 1, "hardwar": 18, "harm": 1, "hat": 10, "have": [1, 2, 10, 12, 14, 16, 17, 18], "head": [8, 18], "healthi": 1, "hebrew": 6, "height": [7, 9], "hello": [10, 18], "help": 17, "here": [5, 9, 11, 15, 16, 18], "hf": 8, "hf_hub_download": 8, "high": 7, "higher": [3, 6, 18], "hindi": 6, "hindi_digit": 6, "hocr": 18, "hook": 18, "horizont": [7, 9, 18], "hous": 6, "how": [2, 11, 12, 14, 16], "howev": 16, "hsv": 9, "html": [1, 2, 3, 7, 18], "http": [1, 3, 6, 7, 8, 14, 18], "hub": 8, "hue": 9, "huggingfac": 8, "hw": 6, "i": [1, 2, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17], "i7": 18, "ic03": [4, 6, 16], "ic13": [4, 6, 16], "icdar": [4, 6], "icdar2019": 6, "id": 18, "ident": 1, "identifi": 4, "iiit": [4, 6], "iiit5k": [6, 16], "iiithw": [4, 6, 16], "imag": [4, 6, 7, 8, 9, 10, 14, 15, 16, 18], "imagenet": 8, "imageri": 1, "images_90k_norm": 6, "img": [6, 9, 16, 17], "img_cont": 7, "img_fold": [6, 16], "img_path": 7, "img_transform": 6, "imgur5k": [4, 6, 16], "imgur5k_annot": 6, "imlist": 6, "impact": 1, "implement": [6, 7, 8, 9, 10, 18], "import": [6, 7, 8, 9, 10, 12, 14, 15, 16, 17, 18], "improv": 8, "inappropri": 1, "incid": 1, "includ": [1, 6, 16, 17], "inclus": 1, "increas": 9, "independ": 9, "index": [2, 7], "indic": 10, "individu": 1, "infer": [4, 8, 9, 15, 18], "inform": [1, 2, 4, 6, 16], "input": [2, 7, 8, 9, 17, 18], "input_crop": 8, "input_pag": [8, 10, 18], "input_shap": 17, "input_tensor": 8, "inspir": [1, 9], "instal": [14, 15, 17], "instanc": [1, 18], "instanti": [8, 18], "instead": [6, 7, 8], "insult": 1, "int": [6, 7, 9], "int64": 10, "integ": 10, "integr": [4, 14, 16], "intel": 18, "interact": [1, 7, 10], "interfac": [14, 17], "interoper": 17, "interpol": 9, "interpret": [6, 7], "intersect": 10, "invert": 9, "investig": 1, "invis": 1, "involv": [1, 18], "io": [12, 14, 15, 17], "iou": 10, "iou_thresh": 10, "iou_threshold": 15, "irregular": [4, 8, 16], "isn": 6, "issu": [1, 2, 14], "italian": 6, "iter": [6, 9, 16, 18], "its": [7, 8, 9, 10, 16, 18], "itself": [8, 14], "j": 10, "job": 2, "join": 2, "jpeg": 9, "jpegqual": 9, "jpg": [6, 7, 14, 17], "json": [6, 16, 18], "json_output": 18, "jump": 2, "just": 1, "kei": [4, 6], "kera": [8, 17], "kernel": [4, 8, 9], "kernel_shap": 9, "keywoard": 8, "keyword": [6, 7, 8, 10], "kie": [8, 12], "kie_predictor": [8, 12], "kiepredictor": 8, "kind": 1, "know": [2, 17], "kwarg": [6, 7, 8, 10], "l": 10, "l_j": 10, "label": [6, 10, 15, 16], "label_fil": [6, 16], "label_fold": 6, "label_path": [6, 16], "labels_path": [6, 16], "ladder": 1, "lambda": 9, "lambdatransform": 9, "lang": 18, "languag": [1, 4, 6, 7, 8, 14, 18], "larg": [8, 14], "largest": 10, "last": [3, 6], "latenc": 8, "later": 2, "latest": 18, "latin": 6, "layer": 17, "layout": 18, "lead": 1, "leader": 1, "learn": [1, 4, 8, 17, 18], "least": 3, "left": [10, 18], "legacy_french": 6, "length": [6, 18], "less": [17, 18], "level": [1, 6, 10, 18], "leverag": 11, "lf": 14, "librari": [2, 3, 11, 12], "light": 4, "lightweight": 17, "like": 1, "limits_": 10, "line": [4, 8, 10, 18], "line_1_1": 18, "link": 12, "linknet": [4, 8], "linknet_resnet18": [8, 12, 17, 18], "linknet_resnet34": [8, 17, 18], "linknet_resnet50": [8, 18], "list": [6, 7, 9, 10, 14], "ll": 10, "load": [4, 6, 8, 15, 17], "load_state_dict": 12, "load_weight": 12, "loc_pr": 18, "local": [2, 4, 6, 8, 10, 16, 18], "localis": 6, "localizationconfus": 10, "locat": [2, 7, 18], "login": 8, "login_to_hub": [8, 14], "logo": [7, 15, 16], "love": 14, "lower": [9, 10, 18], "m": [2, 10, 18], "m1": 3, "macbook": 3, "machin": 17, "made": 4, "magc_resnet31": 8, "mai": [1, 2], "mail": 1, "main": 11, "maintain": 4, "mainten": 2, "make": [1, 2, 10, 12, 13, 14, 17, 18], "mani": [16, 18], "manipul": 18, "map": [6, 8], "map_loc": 12, "master": [4, 8, 18], "match": [10, 18], "mathcal": 10, "matplotlib": [7, 10], "max": [6, 9, 10], "max_angl": 9, "max_area": 9, "max_char": [6, 16], "max_delta": 9, "max_gain": 9, "max_gamma": 9, "max_qual": 9, "max_ratio": 9, "maximum": [6, 9], "maxval": [8, 9], "mbox": 10, "mean": [9, 10, 12], "meaniou": 10, "meant": [7, 17], "measur": 18, "media": 1, "median": 8, "meet": 12, "member": 1, "memori": [13, 17], "mention": 18, "merg": 6, "messag": 2, "meta": 18, "metadata": 17, "metal": 3, "method": [7, 9, 18], "metric": [10, 18], "middl": 18, "might": [17, 18], "min": 9, "min_area": 9, "min_char": [6, 16], "min_gain": 9, "min_gamma": 9, "min_qual": 9, "min_ratio": 9, "min_val": 9, "minde": [1, 3, 4, 8], "minim": [2, 4], "minimalist": [4, 8], "minimum": [3, 6, 9, 10, 18], "minval": 9, "miss": 3, "mistak": 1, "mixed_float16": 17, "mixed_precis": 17, "mjsynth": [4, 6, 16], "mnt": 6, "mobilenet": [8, 14], "mobilenet_v3_larg": 8, "mobilenet_v3_large_r": 8, "mobilenet_v3_smal": [8, 12], "mobilenet_v3_small_crop_orient": [8, 12], "mobilenet_v3_small_page_orient": [8, 12], "mobilenet_v3_small_r": 8, "mobilenetv3": 8, "modal": [4, 6], "mode": 3, "model": [6, 10, 13, 15, 16], "model_nam": [8, 14, 17], "model_path": [15, 17], "moder": 1, "modif": 2, "modifi": [8, 13, 18], "modul": [3, 7, 8, 9, 10, 18], "more": [2, 16, 18], "most": 18, "mozilla": 1, "multi": [4, 8], "multilingu": [6, 14], "multipl": [6, 7, 9, 18], "multipli": 9, "multiprocess": 13, "my": 8, "my_awesome_model": 14, "my_hook": 18, "n": [6, 10], "name": [6, 8, 17, 18], "nation": 1, "natur": [1, 4, 6], "ndarrai": [6, 7, 9, 10], "necessari": [3, 12, 13], "need": [2, 3, 6, 10, 12, 13, 14, 15, 18], "neg": 9, "nest": 18, "network": [4, 6, 8, 17], "neural": [4, 6, 8, 17], "new": [2, 10], "next": [6, 16], "nois": 9, "noisi": [4, 6], "non": [4, 6, 7, 8, 9, 10], "none": [6, 7, 8, 9, 10, 18], "normal": [8, 9], "norwegian": 6, "note": [0, 2, 6, 8, 12, 14, 15, 17], "now": 2, "np": [8, 9, 10, 18], "num_output_channel": 9, "num_sampl": [6, 16], "number": [6, 9, 10, 18], "numpi": [7, 8, 10, 18], "o": 3, "obb": 15, "obj_detect": 14, "object": [6, 7, 10, 15, 18], "objectness_scor": [7, 18], "oblig": 1, "obtain": 18, "occupi": 17, "ocr": [4, 6, 8, 10, 14], "ocr_carea": 18, "ocr_db_crnn": 10, "ocr_lin": 18, "ocr_pag": 18, "ocr_par": 18, "ocr_predictor": [8, 12, 14, 17, 18], "ocrdataset": [6, 16], "ocrmetr": 10, "ocrpredictor": [8, 12], "ocrx_word": 18, "offens": 1, "offici": [1, 8], "offlin": 1, "offset": 9, "onc": 18, "one": [2, 6, 8, 9, 12, 14, 18], "oneof": 9, "ones": [6, 10], "onli": [2, 8, 9, 10, 12, 14, 16, 17, 18], "onlin": 1, "onnx": 15, "onnxruntim": [15, 17], "onnxtr": 17, "opac": 9, "opacity_rang": 9, "open": [1, 2, 14, 17], "opinion": 1, "optic": [4, 18], "optim": [4, 18], "option": [6, 8, 12], "order": [2, 6, 7, 9], "org": [1, 6, 8, 18], "organ": 7, "orient": [1, 7, 8, 11, 15, 18], "orientationpredictor": 8, "other": [1, 2], "otherwis": [1, 7, 10], "our": [2, 8, 18], "out": [2, 8, 9, 10, 18], "outpout": 18, "output": [7, 9, 17], "output_s": [7, 9], "outsid": 13, "over": [6, 10, 18], "overal": [1, 8], "overlai": 7, "overview": 15, "overwrit": 12, "overwritten": 14, "own": 4, "p": [9, 18], "packag": [2, 4, 10, 13, 15, 16, 17], "pad": [6, 8, 9, 18], "page": [3, 6, 8, 10, 12, 18], "page1": 7, "page2": 7, "page_1": 18, "page_idx": [7, 18], "page_orientation_predictor": [8, 12], "page_param": 12, "pair": 10, "paper": 8, "par_1_1": 18, "paragraph": 18, "paragraph_break": 18, "param": [9, 18], "paramet": [4, 7, 8, 17], "pars": [4, 6], "parseq": [4, 8, 14, 17, 18], "part": [6, 9, 18], "parti": 3, "partial": 18, "particip": 1, "pass": [6, 7, 8, 12, 18], "password": 7, "patch": [8, 10], "path": [6, 7, 15, 16, 17], "path_to_checkpoint": 12, "path_to_custom_model": 17, "path_to_pt": 12, "pattern": 1, "pdf": [7, 8, 11], "pdfpage": 7, "peopl": 1, "per": [9, 18], "perform": [4, 7, 8, 9, 10, 13, 17, 18], "period": 1, "permiss": 1, "permut": [4, 8], "persian_lett": 6, "person": [1, 16], "phase": 18, "photo": 16, "physic": [1, 7], "pick": 9, "pictur": 7, "pip": [2, 3, 15, 17], "pipelin": 18, "pixel": [7, 9, 18], "pleas": 2, "plot": 10, "plt": 10, "plug": 14, "plugin": 3, "png": 7, "point": 17, "polici": 13, "polish": 6, "polit": 1, "polygon": [6, 10, 18], "pool": 8, "portugues": 6, "posit": [1, 10], "possibl": [2, 10, 14, 18], "post": [1, 18], "postprocessor": 18, "potenti": 8, "power": 4, "ppageno": 18, "pre": [2, 8, 17], "precis": [10, 18], "pred": 10, "pred_box": 10, "pred_label": 10, "predefin": 16, "predict": [7, 8, 10, 18], "predictor": [4, 7, 8, 11, 12, 14, 17], "prefer": 16, "preinstal": 3, "preprocessor": [12, 18], "prerequisit": 14, "present": 11, "preserv": [8, 9, 18], "preserve_aspect_ratio": [7, 8, 9, 12, 18], "pretrain": [4, 8, 10, 12, 17, 18], "pretrained_backbon": [8, 12], "print": 18, "prior": 6, "privaci": 1, "privat": 1, "probabl": 9, "problem": 2, "procedur": 9, "process": [2, 4, 7, 12, 18], "processor": 18, "produc": [11, 18], "product": 17, "profession": 1, "project": [2, 16], "promptli": 1, "proper": 2, "properli": 6, "provid": [1, 2, 4, 14, 15, 16, 18], "public": [1, 4], "publicli": 18, "publish": 1, "pull": 14, "punctuat": 6, "pure": 6, "purpos": 2, "push_to_hf_hub": [8, 14], "py": 14, "pypdfium2": [3, 7], "pyplot": [7, 10], "python": [2, 15], "python3": 14, "pytorch": [3, 4, 8, 9, 12, 14, 17, 18], "q": 2, "qr": [7, 15], "qr_code": 16, "qualiti": 9, "question": 1, "quickli": 4, "quicktour": 11, "r": 18, "race": 1, "ramdisk": 6, "rand": [8, 9, 10, 17, 18], "random": [8, 9, 10, 18], "randomappli": 9, "randombright": 9, "randomcontrast": 9, "randomcrop": 9, "randomgamma": 9, "randomhorizontalflip": 9, "randomhu": 9, "randomjpegqu": 9, "randomli": 9, "randomres": 9, "randomrot": 9, "randomsatur": 9, "randomshadow": 9, "rang": 9, "rassi": 14, "ratio": [8, 9, 18], "raw": [7, 10], "re": 17, "read": [4, 6, 8], "read_html": 7, "read_img_as_numpi": 7, "read_img_as_tensor": 7, "read_pdf": 7, "readi": 17, "real": [4, 8, 9], "reason": [1, 4, 6], "rebuild": 2, "rebuilt": 2, "recal": [10, 18], "receipt": [4, 6, 18], "reco_arch": [8, 12, 14, 17], "reco_b": 18, "reco_model": [12, 14, 17], "reco_param": 12, "reco_predictor": 12, "recogn": 18, "recognit": [6, 10, 11, 12], "recognition_predictor": [8, 18], "recognition_task": [6, 16], "recognitiondataset": [6, 16], "recognitionpredictor": [8, 12], "rectangular": 8, "reduc": [3, 9], "refer": [2, 3, 12, 14, 15, 16, 18], "regardless": 1, "region": 18, "regroup": 10, "regular": 16, "reject": 1, "rel": [7, 9, 10, 18], "relat": 7, "releas": [0, 3], "relev": 15, "religion": 1, "remov": 1, "render": [7, 18], "repo": 8, "repo_id": [8, 14], "report": 1, "repositori": [6, 8, 14], "repres": [1, 17, 18], "represent": [4, 8], "request": [1, 14], "requir": [3, 9, 17], "research": 4, "residu": 8, "resiz": [9, 18], "resnet": 8, "resnet18": [8, 14], "resnet31": 8, "resnet34": 8, "resnet50": [8, 14], "resolv": 7, "resolve_block": 18, "resolve_lin": 18, "resourc": 16, "respect": 1, "rest": [2, 9, 10], "restrict": 13, "result": [2, 6, 7, 11, 14, 17, 18], "return": 18, "reusabl": 18, "review": 1, "rgb": [7, 9], "rgb_mode": 7, "rgb_output": 7, "right": [1, 8, 10], "robust": [4, 6], "root": 6, "rotat": [6, 7, 8, 9, 10, 11, 12, 16, 18], "run": [2, 3, 8], "same": [2, 7, 10, 16, 17, 18], "sampl": [6, 16, 18], "sample_transform": 6, "sar": [4, 8], "sar_resnet31": [8, 18], "satur": 9, "save": [8, 16], "scale": [7, 8, 9, 10], "scale_rang": 9, "scan": [4, 6], "scene": [4, 6, 8], "score": [7, 10], "script": [2, 16], "seamless": 4, "seamlessli": [4, 18], "search": 8, "searchabl": 11, "sec": 18, "second": 18, "section": [12, 14, 15, 17, 18], "secur": [1, 13], "see": [1, 2], "seen": 18, "segment": [4, 8, 18], "self": 18, "semant": [4, 8], "send": 18, "sens": 10, "sensit": 16, "separ": 18, "sequenc": [4, 6, 7, 8, 10, 18], "sequenti": [9, 18], "seri": 1, "seriou": 1, "set": [1, 3, 6, 8, 10, 13, 15, 18], "set_global_polici": 17, "sever": [7, 9, 18], "sex": 1, "sexual": 1, "shade": 9, "shape": [4, 7, 8, 9, 10, 18], "share": [13, 16], "shift": 9, "shm": 13, "should": [2, 6, 7, 9, 10], "show": [4, 7, 8, 10, 12, 14, 15], "showcas": [2, 11], "shuffl": [6, 9], "side": 10, "signatur": 7, "signific": 16, "simpl": [4, 8, 17], "simpler": 8, "sinc": [6, 16], "singl": [1, 2, 4, 6], "single_img_doc": 17, "size": [1, 6, 7, 9, 15, 18], "skew": 18, "slack": 2, "slightli": 8, "small": [2, 8, 18], "smallest": 7, "snapshot_download": 8, "snippet": 18, "so": [2, 3, 6, 8, 14, 16], "social": 1, "socio": 1, "some": [3, 11, 14, 16], "someth": 2, "somewher": 2, "sort": 1, "sourc": [6, 7, 8, 9, 10, 14], "space": [1, 18], "span": 18, "spanish": 6, "spatial": [4, 6, 7], "specif": [2, 3, 10, 12, 16, 18], "specifi": [1, 6, 7], "speed": [4, 8, 18], "sphinx": 2, "sroie": [4, 6, 16], "stabl": 3, "stackoverflow": 2, "stage": 4, "standalon": 11, "standard": 9, "start": 6, "state": [4, 10, 15], "static": 10, "statu": 1, "std": [9, 12], "step": 13, "still": 18, "str": [6, 7, 8, 9, 10], "straight": [6, 8, 16, 18], "straighten": 18, "straighten_pag": [8, 12, 18], "straigten_pag": 12, "stream": 7, "street": [4, 6], "strict": 3, "strictli": 10, "string": [6, 7, 10, 18], "strive": 3, "strong": [4, 8], "structur": [17, 18], "subset": [6, 18], "suggest": [2, 14], "sum": 10, "summari": 10, "support": [3, 12, 15, 17, 18], "sustain": 1, "svhn": [4, 6, 16], "svt": [6, 16], "swedish": 6, "symmetr": [8, 9, 18], "symmetric_pad": [8, 9, 18], "synthet": 4, "synthtext": [4, 6, 16], "system": 18, "t": [2, 6, 12, 17, 18], "tabl": [14, 15, 16], "take": [1, 6, 18], "target": [6, 7, 9, 10, 16], "target_s": 6, "task": [4, 6, 8, 14, 16, 18], "task2": 6, "team": 3, "techminde": 3, "templat": [2, 4], "tensor": [6, 7, 9, 18], "tensorflow": [3, 4, 7, 8, 9, 12, 14, 17, 18], "tensorspec": 17, "term": 1, "test": [6, 16], "test_set": 6, "text": [6, 7, 8, 10, 16], "text_output": 18, "textmatch": 10, "textnet": 8, "textnet_bas": 8, "textnet_smal": 8, "textnet_tini": 8, "textract": [4, 18], "textstylebrush": [4, 6], "textual": [4, 6, 7, 8, 18], "tf": [3, 7, 8, 9, 14, 17], "than": [2, 10, 14], "thank": 2, "thei": [1, 10], "them": [6, 18], "thi": [1, 2, 3, 5, 6, 9, 10, 12, 13, 14, 16, 17, 18], "thing": [17, 18], "third": 3, "those": [1, 7, 18], "threaten": 1, "threshold": 18, "through": [1, 9, 15, 16], "tilman": 14, "time": [1, 4, 8, 10, 16], "tini": 8, "titl": [7, 18], "tm": 18, "tmp": 13, "togeth": [2, 7], "tograi": 9, "tool": 16, "top": [10, 17, 18], "topic": 2, "torch": [3, 9, 12, 14, 17], "torchvis": 9, "total": 12, "toward": [1, 3], "train": [2, 6, 8, 9, 14, 15, 16, 17, 18], "train_it": [6, 16], "train_load": [6, 16], "train_pytorch": 14, "train_set": [6, 16], "train_tensorflow": 14, "trainabl": [4, 8], "tranform": 9, "transcrib": 18, "transfer": [4, 6], "transfo": 9, "transform": [4, 6, 8], "translat": 1, "troll": 1, "true": [6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18], "truth": 10, "tune": 17, "tupl": [6, 7, 9, 10], "two": [7, 13], "txt": 6, "type": [7, 10, 14, 17, 18], "typic": 18, "u": [1, 2], "ucsd": 6, "udac": 2, "uint8": [7, 8, 10, 18], "ukrainian": 6, "unaccept": 1, "underli": [16, 18], "underneath": 7, "understand": [4, 6, 18], "uniform": [8, 9], "uniformli": 9, "uninterrupt": [7, 18], "union": 10, "unittest": 2, "unlock": 7, "unoffici": 8, "unprofession": 1, "unsolicit": 1, "unsupervis": 4, "unwelcom": 1, "up": [8, 18], "updat": 10, "upgrad": 2, "upper": [6, 9], "uppercas": 16, "url": 7, "us": [1, 2, 3, 6, 8, 10, 11, 12, 13, 14, 15, 18], "usabl": 18, "usag": [13, 17], "use_polygon": [6, 10, 16], "useabl": 18, "user": [4, 7, 11], "utf": 18, "util": 17, "v1": 14, "v3": [8, 14, 18], "valid": 16, "valu": [2, 7, 9, 18], "valuabl": 4, "variabl": 13, "varieti": 6, "veri": 8, "version": [1, 2, 3, 17, 18], "vgg": 8, "vgg16": 14, "vgg16_bn_r": 8, "via": 1, "vietnames": 6, "view": [4, 6], "viewpoint": 1, "violat": 1, "visibl": 1, "vision": [4, 6, 8], "visiondataset": 6, "visiontransform": 8, "visual": [3, 4, 15], "visualize_pag": 10, "vit_": 8, "vit_b": 8, "vitstr": [4, 8, 17], "vitstr_bas": [8, 18], "vitstr_smal": [8, 12, 17, 18], "viz": 3, "vocab": [12, 14, 16, 17, 18], "vocabulari": [6, 12, 14], "w": [7, 8, 9, 10], "w3": 18, "wa": 1, "wai": [1, 4, 16], "want": [2, 17, 18], "warmup": 18, "wasn": 2, "we": [1, 2, 3, 4, 7, 9, 12, 14, 16, 17, 18], "weasyprint": 7, "web": [2, 7], "websit": 6, "welcom": 1, "well": [1, 17], "were": [1, 7, 18], "what": 1, "when": [1, 2, 8], "whenev": 2, "where": [2, 7, 9, 10], "whether": [2, 6, 7, 9, 10, 16, 18], "which": [1, 8, 13, 15, 16, 18], "whichev": 3, "while": [9, 18], "why": 1, "width": [7, 9], "wiki": 1, "wildreceipt": [4, 6, 16], "window": [8, 10], "wish": 2, "within": 1, "without": [1, 6, 8], "wonder": 2, "word": [4, 6, 8, 10, 18], "word_1_1": 18, "word_1_2": 18, "word_1_3": 18, "wordgener": [6, 16], "words_onli": 10, "work": [12, 13, 18], "workflow": 2, "worklow": 2, "world": [10, 18], "worth": 8, "wrap": 18, "wrapper": [6, 9], "write": 13, "written": [1, 7], "www": [1, 7, 18], "x": [7, 9, 10], "x_ascend": 18, "x_descend": 18, "x_i": 10, "x_size": 18, "x_wconf": 18, "xhtml": 18, "xmax": 7, "xmin": 7, "xml": 18, "xml_bytes_str": 18, "xml_element": 18, "xml_output": 18, "xmln": 18, "y": 10, "y_i": 10, "y_j": 10, "yet": 15, "ymax": 7, "ymin": 7, "yolov8": 15, "you": [2, 3, 6, 7, 8, 12, 13, 14, 15, 16, 17, 18], "your": [2, 4, 7, 10, 18], "yoursit": 7, "zero": [9, 10], "zoo": 12, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 6, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 6, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 6, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 6, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 6, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 6, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 6, "\u00e4\u00f6\u00e4\u00f6": 6, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 6, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 6, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 6, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 6, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 6, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 6, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0905": 6, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 6, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 6, "\u0950": 6, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 6, "\u09bd": 6, "\u09ce": 6, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 6}, "titles": ["Changelog", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 2, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 1], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 1], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 1], "31": 0, "4": [0, 1], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 18, "approach": 18, "architectur": 18, "arg": [6, 7, 8, 9, 10], "artefact": 7, "artefactdetect": 15, "attribut": 1, "avail": [15, 16, 18], "aw": 13, "ban": 1, "block": 7, "bug": 2, "changelog": 0, "choos": [16, 18], "classif": [8, 12, 14], "code": [1, 2], "codebas": 2, "commit": 2, "commun": 14, "compos": 9, "conda": 3, "conduct": 1, "connect": 2, "continu": 2, "contrib": 5, "contribut": [2, 5, 15], "contributor": 1, "convent": 14, "correct": 1, "coven": 1, "custom": [6, 12], "data": 16, "dataload": 6, "dataset": [4, 6, 16], "detect": [4, 8, 14, 16, 18], "develop": 2, "do": 18, "doctr": [2, 4, 5, 6, 7, 8, 9, 10, 11], "document": [2, 4, 7], "end": 18, "enforc": 1, "evalu": 10, "export": 17, "factori": 8, "featur": [2, 4], "feedback": 2, "file": 7, "from": 14, "gener": [6, 16], "git": 3, "guidelin": 1, "half": 17, "hub": 14, "huggingfac": 14, "i": 18, "infer": 17, "instal": [2, 3], "integr": [2, 15], "io": 7, "lambda": 13, "let": 2, "line": 7, "linux": 3, "load": [12, 14, 16], "loader": 6, "main": 4, "mode": 2, "model": [4, 8, 12, 14, 17, 18], "modifi": 2, "modul": [5, 15], "name": 14, "notebook": 11, "object": 16, "ocr": [16, 18], "onli": 3, "onnx": 17, "optim": 17, "option": 18, "orient": 12, "our": 1, "output": 18, "own": [12, 16], "packag": 3, "page": 7, "perman": 1, "pipelin": 15, "pledg": 1, "precis": 17, "predictor": 18, "prepar": 17, "prerequisit": 3, "pretrain": 14, "push": 14, "python": 3, "qualiti": 2, "question": 2, "read": 7, "readi": 16, "recognit": [4, 8, 14, 16, 18], "report": 2, "request": 2, "respons": 1, "return": [6, 7, 8, 10], "right": 18, "scope": 1, "share": 14, "should": 18, "stage": 18, "standard": 1, "structur": [2, 7], "style": 2, "support": [4, 5, 6, 9], "synthet": [6, 16], "task": 10, "temporari": 1, "test": 2, "text": [4, 18], "train": 12, "transform": 9, "two": 18, "unit": 2, "us": [16, 17], "util": 10, "v0": 0, "verif": 2, "via": 3, "visual": 10, "vocab": 6, "warn": 1, "what": 18, "word": 7, "your": [12, 14, 15, 16, 17], "zoo": [4, 8]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[1, "correction"]], "2. Warning": [[1, "warning"]], "3. Temporary Ban": [[1, "temporary-ban"]], "4. Permanent Ban": [[1, "permanent-ban"]], "AWS Lambda": [[13, null]], "Advanced options": [[18, "advanced-options"]], "Args:": [[6, "args"], [6, "id4"], [6, "id7"], [6, "id10"], [6, "id13"], [6, "id16"], [6, "id19"], [6, "id22"], [6, "id25"], [6, "id29"], [6, "id32"], [6, "id37"], [6, "id40"], [6, "id46"], [6, "id49"], [6, "id50"], [6, "id51"], [6, "id54"], [6, "id57"], [6, "id60"], [6, "id61"], [7, "args"], [7, "id2"], [7, "id3"], [7, "id4"], [7, "id5"], [7, "id6"], [7, "id7"], [7, "id10"], [7, "id12"], [7, "id14"], [7, "id16"], [7, "id20"], [7, "id24"], [7, "id28"], [8, "args"], [8, "id3"], [8, "id8"], [8, "id13"], [8, "id17"], [8, "id21"], [8, "id26"], [8, "id31"], [8, "id36"], [8, "id41"], [8, "id46"], [8, "id50"], [8, "id54"], [8, "id59"], [8, "id63"], [8, "id68"], [8, "id73"], [8, "id77"], [8, "id81"], [8, "id85"], [8, "id90"], [8, "id95"], [8, "id99"], [8, "id104"], [8, "id109"], [8, "id114"], [8, "id119"], [8, "id123"], [8, "id127"], [8, "id132"], [8, "id137"], [8, "id142"], [8, "id146"], [8, "id150"], [8, "id155"], [8, "id159"], [8, "id163"], [8, "id167"], [8, "id169"], [8, "id171"], [8, "id173"], [9, "args"], [9, "id1"], [9, "id2"], [9, "id3"], [9, "id4"], [9, "id5"], [9, "id6"], [9, "id7"], [9, "id8"], [9, "id9"], [9, "id10"], [9, "id11"], [9, "id12"], [9, "id13"], [9, "id14"], [9, "id15"], [9, "id16"], [9, "id17"], [9, "id18"], [9, "id19"], [10, "args"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"]], "Artefact": [[7, "artefact"]], "ArtefactDetection": [[15, "artefactdetection"]], "Attribution": [[1, "attribution"]], "Available Datasets": [[16, "available-datasets"]], "Available architectures": [[18, "available-architectures"], [18, "id1"], [18, "id2"]], "Available contribution modules": [[15, "available-contribution-modules"]], "Block": [[7, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[16, null]], "Choosing the right model": [[18, null]], "Classification": [[14, "classification"]], "Code quality": [[2, "code-quality"]], "Code style verification": [[2, "code-style-verification"]], "Codebase structure": [[2, "codebase-structure"]], "Commits": [[2, "commits"]], "Composing transformations": [[9, "composing-transformations"]], "Continuous Integration": [[2, "continuous-integration"]], "Contributing to docTR": [[2, null]], "Contributor Covenant Code of Conduct": [[1, null]], "Custom dataset loader": [[6, "custom-dataset-loader"]], "Custom orientation classification models": [[12, "custom-orientation-classification-models"]], "Data Loading": [[16, "data-loading"]], "Dataloader": [[6, "dataloader"]], "Detection": [[14, "detection"], [16, "detection"]], "Detection predictors": [[18, "detection-predictors"]], "Developer mode installation": [[2, "developer-mode-installation"]], "Developing docTR": [[2, "developing-doctr"]], "Document": [[7, "document"]], "Document structure": [[7, "document-structure"]], "End-to-End OCR": [[18, "end-to-end-ocr"]], "Enforcement": [[1, "enforcement"]], "Enforcement Guidelines": [[1, "enforcement-guidelines"]], "Enforcement Responsibilities": [[1, "enforcement-responsibilities"]], "Export to ONNX": [[17, "export-to-onnx"]], "Feature requests & bug report": [[2, "feature-requests-bug-report"]], "Feedback": [[2, "feedback"]], "File reading": [[7, "file-reading"]], "Half-precision": [[17, "half-precision"]], "Installation": [[3, null]], "Integrate contributions into your pipeline": [[15, null]], "Let\u2019s connect": [[2, "let-s-connect"]], "Line": [[7, "line"]], "Loading from Huggingface Hub": [[14, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[12, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[12, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[4, "main-features"]], "Model optimization": [[17, "model-optimization"]], "Model zoo": [[4, "model-zoo"]], "Modifying the documentation": [[2, "modifying-the-documentation"]], "Naming conventions": [[14, "naming-conventions"]], "OCR": [[16, "ocr"]], "Object Detection": [[16, "object-detection"]], "Our Pledge": [[1, "our-pledge"]], "Our Standards": [[1, "our-standards"]], "Page": [[7, "page"]], "Preparing your model for inference": [[17, null]], "Prerequisites": [[3, "prerequisites"]], "Pretrained community models": [[14, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[14, "pushing-to-the-huggingface-hub"]], "Questions": [[2, "questions"]], "Recognition": [[14, "recognition"], [16, "recognition"]], "Recognition predictors": [[18, "recognition-predictors"]], "Returns:": [[6, "returns"], [7, "returns"], [7, "id11"], [7, "id13"], [7, "id15"], [7, "id19"], [7, "id23"], [7, "id27"], [7, "id31"], [8, "returns"], [8, "id6"], [8, "id11"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id29"], [8, "id34"], [8, "id39"], [8, "id44"], [8, "id49"], [8, "id53"], [8, "id57"], [8, "id62"], [8, "id66"], [8, "id71"], [8, "id76"], [8, "id80"], [8, "id84"], [8, "id88"], [8, "id93"], [8, "id98"], [8, "id102"], [8, "id107"], [8, "id112"], [8, "id117"], [8, "id122"], [8, "id126"], [8, "id130"], [8, "id135"], [8, "id140"], [8, "id145"], [8, "id149"], [8, "id153"], [8, "id158"], [8, "id162"], [8, "id166"], [8, "id168"], [8, "id170"], [8, "id172"], [10, "returns"]], "Scope": [[1, "scope"]], "Share your model with the community": [[14, null]], "Supported Vocabs": [[6, "supported-vocabs"]], "Supported contribution modules": [[5, "supported-contribution-modules"]], "Supported datasets": [[4, "supported-datasets"]], "Supported transformations": [[9, "supported-transformations"]], "Synthetic dataset generator": [[6, "synthetic-dataset-generator"], [16, "synthetic-dataset-generator"]], "Task evaluation": [[10, "task-evaluation"]], "Text Detection": [[18, "text-detection"]], "Text Recognition": [[18, "text-recognition"]], "Text detection models": [[4, "text-detection-models"]], "Text recognition models": [[4, "text-recognition-models"]], "Train your own model": [[12, null]], "Two-stage approaches": [[18, "two-stage-approaches"]], "Unit tests": [[2, "unit-tests"]], "Use your own datasets": [[16, "use-your-own-datasets"]], "Using your ONNX exported model": [[17, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[3, "via-conda-only-for-linux"]], "Via Git": [[3, "via-git"]], "Via Python Package": [[3, "via-python-package"]], "Visualization": [[10, "visualization"]], "What should I do with the output?": [[18, "what-should-i-do-with-the-output"]], "Word": [[7, "word"]], "docTR Notebooks": [[11, null]], "docTR Vocabs": [[6, "id62"]], "docTR: Document Text Recognition": [[4, null]], "doctr.contrib": [[5, null]], "doctr.datasets": [[6, null], [6, "datasets"]], "doctr.io": [[7, null]], "doctr.models": [[8, null]], "doctr.models.classification": [[8, "doctr-models-classification"]], "doctr.models.detection": [[8, "doctr-models-detection"]], "doctr.models.factory": [[8, "doctr-models-factory"]], "doctr.models.recognition": [[8, "doctr-models-recognition"]], "doctr.models.zoo": [[8, "doctr-models-zoo"]], "doctr.transforms": [[9, null]], "doctr.utils": [[10, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[7, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[7, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[9, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[6, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[9, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[9, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[6, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[6, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[8, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[6, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[6, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[7, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[7, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[6, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[6, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[9, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[9, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[6, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[6, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[6, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[6, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[6, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[8, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[9, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[7, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[6, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[9, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[8, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[6, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[9, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[7, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[9, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[9, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[9, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[9, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[9, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[9, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[9, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[9, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[9, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[9, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[9, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[9, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[7, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[7, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[7, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[6, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[9, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[7, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[7, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[6, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[6, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[6, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[6, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[9, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[10, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[6, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[7, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[6, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[6, 0, 1, "", "CORD"], [6, 0, 1, "", "CharacterGenerator"], [6, 0, 1, "", "DetectionDataset"], [6, 0, 1, "", "DocArtefacts"], [6, 0, 1, "", "FUNSD"], [6, 0, 1, "", "IC03"], [6, 0, 1, "", "IC13"], [6, 0, 1, "", "IIIT5K"], [6, 0, 1, "", "IIITHWS"], [6, 0, 1, "", "IMGUR5K"], [6, 0, 1, "", "MJSynth"], [6, 0, 1, "", "OCRDataset"], [6, 0, 1, "", "RecognitionDataset"], [6, 0, 1, "", "SROIE"], [6, 0, 1, "", "SVHN"], [6, 0, 1, "", "SVT"], [6, 0, 1, "", "SynthText"], [6, 0, 1, "", "WILDRECEIPT"], [6, 0, 1, "", "WordGenerator"], [6, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[6, 0, 1, "", "DataLoader"]], "doctr.io": [[7, 0, 1, "", "Artefact"], [7, 0, 1, "", "Block"], [7, 0, 1, "", "Document"], [7, 0, 1, "", "DocumentFile"], [7, 0, 1, "", "Line"], [7, 0, 1, "", "Page"], [7, 0, 1, "", "Word"], [7, 1, 1, "", "decode_img_as_tensor"], [7, 1, 1, "", "read_html"], [7, 1, 1, "", "read_img_as_numpy"], [7, 1, 1, "", "read_img_as_tensor"], [7, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[7, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[7, 2, 1, "", "from_images"], [7, 2, 1, "", "from_pdf"], [7, 2, 1, "", "from_url"]], "doctr.io.Page": [[7, 2, 1, "", "show"]], "doctr.models": [[8, 1, 1, "", "kie_predictor"], [8, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[8, 1, 1, "", "crop_orientation_predictor"], [8, 1, 1, "", "magc_resnet31"], [8, 1, 1, "", "mobilenet_v3_large"], [8, 1, 1, "", "mobilenet_v3_large_r"], [8, 1, 1, "", "mobilenet_v3_small"], [8, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [8, 1, 1, "", "mobilenet_v3_small_page_orientation"], [8, 1, 1, "", "mobilenet_v3_small_r"], [8, 1, 1, "", "page_orientation_predictor"], [8, 1, 1, "", "resnet18"], [8, 1, 1, "", "resnet31"], [8, 1, 1, "", "resnet34"], [8, 1, 1, "", "resnet50"], [8, 1, 1, "", "textnet_base"], [8, 1, 1, "", "textnet_small"], [8, 1, 1, "", "textnet_tiny"], [8, 1, 1, "", "vgg16_bn_r"], [8, 1, 1, "", "vit_b"], [8, 1, 1, "", "vit_s"]], "doctr.models.detection": [[8, 1, 1, "", "db_mobilenet_v3_large"], [8, 1, 1, "", "db_resnet50"], [8, 1, 1, "", "detection_predictor"], [8, 1, 1, "", "fast_base"], [8, 1, 1, "", "fast_small"], [8, 1, 1, "", "fast_tiny"], [8, 1, 1, "", "linknet_resnet18"], [8, 1, 1, "", "linknet_resnet34"], [8, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[8, 1, 1, "", "from_hub"], [8, 1, 1, "", "login_to_hub"], [8, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[8, 1, 1, "", "crnn_mobilenet_v3_large"], [8, 1, 1, "", "crnn_mobilenet_v3_small"], [8, 1, 1, "", "crnn_vgg16_bn"], [8, 1, 1, "", "master"], [8, 1, 1, "", "parseq"], [8, 1, 1, "", "recognition_predictor"], [8, 1, 1, "", "sar_resnet31"], [8, 1, 1, "", "vitstr_base"], [8, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[9, 0, 1, "", "ChannelShuffle"], [9, 0, 1, "", "ColorInversion"], [9, 0, 1, "", "Compose"], [9, 0, 1, "", "GaussianBlur"], [9, 0, 1, "", "GaussianNoise"], [9, 0, 1, "", "LambdaTransformation"], [9, 0, 1, "", "Normalize"], [9, 0, 1, "", "OneOf"], [9, 0, 1, "", "RandomApply"], [9, 0, 1, "", "RandomBrightness"], [9, 0, 1, "", "RandomContrast"], [9, 0, 1, "", "RandomCrop"], [9, 0, 1, "", "RandomGamma"], [9, 0, 1, "", "RandomHorizontalFlip"], [9, 0, 1, "", "RandomHue"], [9, 0, 1, "", "RandomJpegQuality"], [9, 0, 1, "", "RandomResize"], [9, 0, 1, "", "RandomRotate"], [9, 0, 1, "", "RandomSaturation"], [9, 0, 1, "", "RandomShadow"], [9, 0, 1, "", "Resize"], [9, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[10, 0, 1, "", "DetectionMetric"], [10, 0, 1, "", "LocalizationConfusion"], [10, 0, 1, "", "OCRMetric"], [10, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.visualization": [[10, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [1, 7, 8, 10, 14, 17], "0": [1, 3, 6, 9, 10, 12, 15, 16, 18], "00": 18, "01": 18, "0123456789": 6, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "02562": 8, "03": 18, "035": 18, "0361328125": 18, "04": 18, "05": 18, "06": 18, "06640625": 18, "07": 18, "08": [9, 18], "09": 18, "0966796875": 18, "1": [6, 7, 8, 9, 10, 12, 16, 18], "10": [6, 10, 18], "100": [6, 9, 10, 16, 18], "1000": 18, "101": 6, "1024": [8, 12, 18], "104": 6, "106": 6, "108": 6, "1095": 16, "11": 18, "110": 10, "1107": 16, "114": 6, "115": 6, "1156": 16, "116": 6, "118": 6, "11800h": 18, "11th": 18, "12": 18, "120": 6, "123": 6, "126": 6, "1268": 16, "128": [8, 12, 17, 18], "13": 18, "130": 6, "13068": 16, "131": 6, "1337891": 16, "1357421875": 18, "1396484375": 18, "14": 18, "1420": 18, "14470v1": 6, "149": 16, "15": 18, "150": [10, 18], "1552": 18, "16": [8, 17, 18], "1630859375": 18, "1684": 18, "16x16": 8, "17": 18, "1778": 18, "1782": 18, "18": [8, 18], "185546875": 18, "1900": 18, "1910": 8, "19342": 16, "19370": 16, "195": 6, "19598": 16, "199": 18, "1999": 18, "2": [3, 4, 6, 7, 9, 15, 18], "20": 18, "200": 10, "2000": 16, "2003": [4, 6], "2012": 6, "2013": [4, 6], "2015": 6, "2019": 4, "207901": 16, "21": 18, "2103": 6, "2186": 16, "21888": 16, "22": 18, "224": [8, 9], "225": 9, "22672": 16, "229": [9, 16], "23": 18, "233": 16, "236": 6, "24": 18, "246": 16, "249": 16, "25": 18, "2504": 18, "255": [7, 8, 9, 10, 18], "256": 8, "257": 16, "26": 18, "26032": 16, "264": 12, "27": 18, "2700": 16, "2710": 18, "2749": 12, "28": 18, "287": 12, "29": 18, "296": 12, "299": 12, "2d": 18, "3": [3, 4, 7, 8, 9, 10, 17, 18], "30": 18, "300": 16, "3000": 16, "301": 12, "30595": 18, "30ghz": 18, "31": 8, "32": [6, 8, 9, 12, 16, 17, 18], "3232421875": 18, "33": [9, 18], "33402": 16, "33608": 16, "34": [8, 18], "340": 18, "3456": 18, "3515625": 18, "36": 18, "360": 16, "37": [6, 18], "38": 18, "39": 18, "4": [8, 9, 10, 18], "40": 18, "406": 9, "41": 18, "42": 18, "43": 18, "44": 18, "45": 18, "456": 9, "46": 18, "47": 18, "472": 16, "48": [6, 18], "485": 9, "49": 18, "49377": 16, "5": [6, 9, 10, 15, 18], "50": [8, 16, 18], "51": 18, "51171875": 18, "512": 8, "52": [6, 18], "529": 18, "53": 18, "54": 18, "540": 18, "5478515625": 18, "55": 18, "56": 18, "57": 18, "58": [6, 18], "580": 18, "5810546875": 18, "583": 18, "59": 18, "597": 18, "5k": [4, 6], "5m": 18, "6": [9, 18], "60": 9, "600": [8, 10, 18], "61": 18, "62": 18, "626": 16, "63": 18, "64": [8, 9, 18], "641": 18, "647": 16, "65": 18, "66": 18, "67": 18, "68": 18, "69": 18, "693": 12, "694": 12, "695": 12, "6m": 18, "7": 18, "70": [6, 10, 18], "707470": 16, "71": [6, 18], "7100000": 16, "7141797": 16, "7149": 16, "72": 18, "72dpi": 7, "73": 18, "73257": 16, "74": 18, "75": [9, 18], "7581382": 16, "76": 18, "77": 18, "772": 12, "772875": 16, "78": 18, "785": 12, "79": 18, "793533": 16, "796": 16, "798": 12, "7m": 18, "8": [8, 9, 18], "80": 18, "800": [8, 10, 16, 18], "81": 18, "82": 18, "83": 18, "84": 18, "849": 16, "85": 18, "8564453125": 18, "857": 18, "85875": 16, "86": 18, "8603515625": 18, "87": 18, "8707": 16, "88": 18, "89": 18, "9": [3, 9, 18], "90": 18, "90k": 6, "90kdict32px": 6, "91": 18, "914085328578949": 18, "92": 18, "93": 18, "94": [6, 18], "95": [10, 18], "9578408598899841": 18, "96": 18, "97": 18, "98": 18, "99": 18, "9949972033500671": 18, "A": [1, 2, 4, 6, 7, 8, 11, 17], "As": 2, "Be": 18, "Being": 1, "By": 13, "For": [1, 2, 3, 12, 18], "If": [2, 7, 8, 12, 18], "In": [2, 6, 16], "It": [9, 14, 15, 17], "Its": [4, 8], "No": [1, 18], "Of": 6, "Or": [15, 17], "The": [1, 2, 6, 7, 10, 13, 15, 16, 17, 18], "Then": 8, "To": [2, 3, 13, 14, 15, 17, 18], "_": [1, 6, 8], "__call__": 18, "_build": 2, "_i": 10, "ab": 6, "abc": 17, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "abdef": [6, 16], "abl": [16, 18], "about": [1, 16, 18], "abov": 18, "abstractdataset": 6, "abus": 1, "accept": 1, "access": [4, 7, 16, 18], "account": [1, 14], "accur": 18, "accuraci": 10, "achiev": 17, "act": 1, "action": 1, "activ": 4, "ad": [2, 8, 9], "adapt": 1, "add": [9, 10, 14, 18], "add_hook": 18, "add_label": 10, "addit": [2, 3, 7, 15, 18], "addition": [2, 18], "address": [1, 7], "adjust": 9, "advanc": 1, "advantag": 17, "advis": 2, "aesthet": [4, 6], "affect": 1, "after": [14, 18], "ag": 1, "again": 8, "aggreg": [10, 16], "aggress": 1, "align": [1, 7, 9], "all": [1, 2, 5, 6, 7, 9, 10, 15, 16, 18], "allow": [1, 17], "along": 18, "alreadi": [2, 17], "also": [1, 8, 14, 15, 16, 18], "alwai": 16, "an": [1, 2, 4, 6, 7, 8, 10, 15, 17, 18], "analysi": [7, 15], "ancient_greek": 6, "angl": [7, 9], "ani": [1, 6, 7, 8, 9, 10, 17, 18], "annot": 6, "anot": 16, "anoth": [8, 12, 16], "answer": 1, "anyascii": 10, "anyon": 4, "anyth": 15, "api": [2, 4], "apolog": 1, "apologi": 1, "app": 2, "appear": 1, "appli": [1, 6, 9], "applic": [4, 8], "appoint": 1, "appreci": 14, "appropri": [1, 2, 18], "ar": [1, 2, 3, 5, 6, 7, 9, 10, 11, 15, 16, 18], "arab": 6, "arabic_diacrit": 6, "arabic_lett": 6, "arabic_punctu": 6, "arbitrarili": [4, 8], "arch": [8, 14], "architectur": [4, 8, 14, 15], "area": 18, "argument": [6, 7, 8, 10, 12, 18], "around": 1, "arrai": [7, 9, 10], "art": [4, 15], "artefact": [10, 15, 18], "artefact_typ": 7, "artifici": [4, 6], "arxiv": [6, 8], "asarrai": 10, "ascii_lett": 6, "aspect": [4, 8, 9, 18], "assess": 10, "assign": 10, "associ": 7, "assum": 8, "assume_straight_pag": [8, 12, 18], "astyp": [8, 10, 18], "attack": 1, "attend": [4, 8], "attent": [1, 8], "autom": 4, "automat": 18, "autoregress": [4, 8], "avail": [1, 4, 5, 9], "averag": [9, 18], "avoid": [1, 3], "aw": [4, 18], "awar": 18, "azur": 18, "b": [8, 10, 18], "b_j": 10, "back": 2, "backbon": 8, "backend": 18, "background": 16, "bangla": 6, "bar": 15, "bar_cod": 16, "base": [4, 8, 15], "baselin": [4, 8, 18], "batch": [6, 8, 9, 15, 16, 18], "batch_siz": [6, 12, 15, 16, 17], "bblanchon": 3, "bbox": 18, "becaus": 13, "been": [2, 10, 16, 18], "befor": [6, 8, 9, 18], "begin": 10, "behavior": [1, 18], "being": [10, 18], "belong": 18, "benchmark": 18, "best": 1, "better": [11, 18], "between": [9, 10, 18], "bgr": 7, "bilinear": 9, "bin_thresh": 18, "binar": [4, 8, 18], "binari": [7, 17, 18], "bit": 17, "block": [10, 18], "block_1_1": 18, "blur": 9, "bmvc": 6, "bn": 14, "bodi": [1, 18], "bool": [6, 7, 8, 9, 10], "boolean": [8, 18], "both": [4, 6, 9, 16, 18], "bottom": [8, 18], "bound": [6, 7, 8, 9, 10, 15, 16, 18], "box": [6, 7, 8, 9, 10, 15, 16, 18], "box_thresh": 18, "bright": 9, "browser": [2, 4], "build": [2, 3, 17], "built": 2, "byte": [7, 18], "c": [3, 7, 10], "c_j": 10, "cach": [2, 6, 13], "cache_sampl": 6, "call": 17, "callabl": [6, 9], "can": [2, 3, 12, 13, 14, 15, 16, 18], "capabl": [2, 11, 18], "case": [6, 10], "cf": 18, "cfg": 18, "challeng": 6, "challenge2_test_task12_imag": 6, "challenge2_test_task1_gt": 6, "challenge2_training_task12_imag": 6, "challenge2_training_task1_gt": 6, "chang": [13, 18], "channel": [1, 2, 7, 9], "channel_prior": 3, "channelshuffl": 9, "charact": [4, 6, 7, 10, 16, 18], "charactergener": [6, 16], "characterist": 1, "charg": 18, "charset": 18, "chart": 7, "check": [2, 14, 18], "checkpoint": 8, "chip": 3, "ci": 2, "clarifi": 1, "clariti": 1, "class": [1, 6, 7, 9, 10, 18], "class_nam": 12, "classif": [16, 18], "classmethod": 7, "clear": 2, "clone": 3, "close": 2, "co": 14, "code": [4, 7, 15], "codecov": 2, "colab": 11, "collate_fn": 6, "collect": [7, 15], "color": 9, "colorinvers": 9, "column": 7, "com": [1, 3, 7, 8, 14], "combin": 18, "command": [2, 15], "comment": 1, "commit": 1, "common": [1, 9, 10, 17], "commun": 1, "compar": 4, "comparison": [10, 18], "competit": 6, "compil": [11, 18], "complaint": 1, "complementari": 10, "complet": 2, "compon": 18, "compos": [6, 18], "comprehens": 18, "comput": [6, 10, 17, 18], "conf_threshold": 15, "confid": [7, 18], "config": [3, 8], "configur": 8, "confus": 10, "consecut": [9, 18], "consequ": 1, "consid": [1, 2, 6, 7, 10, 18], "consist": 18, "consolid": [4, 6], "constant": 9, "construct": 1, "contact": 1, "contain": [5, 6, 11, 16, 18], "content": [6, 7, 18], "context": 8, "contib": 3, "continu": 1, "contrast": 9, "contrast_factor": 9, "contrib": [3, 15], "contribut": 1, "contributor": 2, "convers": 7, "convert": [7, 9], "convolut": 8, "coordin": [7, 18], "cord": [4, 6, 16, 18], "core": [10, 18], "corner": 18, "correct": 9, "correspond": [3, 7, 9, 18], "could": [1, 15], "counterpart": 10, "cover": 2, "coverag": 2, "cpu": [4, 12, 17], "creat": 14, "crnn": [4, 8, 14], "crnn_mobilenet_v3_larg": [8, 14, 18], "crnn_mobilenet_v3_smal": [8, 17, 18], "crnn_vgg16_bn": [8, 12, 14, 18], "crop": [7, 8, 9, 12, 16, 18], "crop_orient": [7, 18], "crop_orientation_predictor": [8, 12], "crop_param": 12, "cuda": 17, "currenc": 6, "current": [2, 12, 18], "custom": [14, 15, 17, 18], "custom_crop_orientation_model": 12, "custom_page_orientation_model": 12, "customhook": 18, "cvit": 4, "czczup": 8, "czech": 6, "d": [6, 16], "danish": 6, "data": [4, 6, 7, 9, 10, 12, 14], "dataload": 16, "dataset": [8, 12, 18], "dataset_info": 6, "date": [12, 18], "db": 14, "db_mobilenet_v3_larg": [8, 14, 18], "db_resnet34": 18, "db_resnet50": [8, 12, 14, 18], "dbnet": [4, 8], "deal": [11, 18], "decis": 1, "decod": 7, "decode_img_as_tensor": 7, "dedic": 17, "deem": 1, "deep": [8, 18], "def": 18, "default": [3, 7, 12, 13, 18], "defer": 16, "defin": [10, 17], "degre": [7, 9, 18], "degress": 7, "delet": 2, "delimit": 18, "delta": 9, "demo": [2, 4], "demonstr": 1, "depend": [2, 3, 4, 18], "deploi": 2, "deploy": 4, "derogatori": 1, "describ": 8, "descript": 11, "design": 9, "desir": 7, "det_arch": [8, 12, 14, 17], "det_b": 18, "det_model": [12, 14, 17], "det_param": 12, "det_predictor": [12, 18], "detail": [12, 18], "detect": [6, 7, 10, 11, 12, 15], "detect_languag": 8, "detect_orient": [8, 12, 18], "detection_predictor": [8, 18], "detection_task": [6, 16], "detectiondataset": [6, 16], "detectionmetr": 10, "detectionpredictor": [8, 12], "detector": [4, 8, 15], "deterior": 8, "determin": 1, "dev": [2, 13], "develop": 3, "deviat": 9, "devic": 17, "dict": [7, 10, 18], "dictionari": [7, 10], "differ": 1, "differenti": [4, 8], "digit": [4, 6, 16], "dimens": [7, 10, 18], "dimension": 9, "direct": 6, "directli": [14, 18], "directori": [2, 13], "disabl": [1, 13, 18], "disable_crop_orient": 18, "disable_page_orient": 18, "disclaim": 18, "discuss": 2, "disparag": 1, "displai": [7, 10], "display_artefact": 10, "distribut": 9, "div": 18, "divers": 1, "divid": 7, "do": [2, 3, 8], "doc": [2, 7, 15, 17, 18], "docartefact": [6, 16], "docstr": 2, "doctr": [3, 12, 13, 14, 15, 16, 17, 18], "doctr_cache_dir": 13, "doctr_multiprocessing_dis": 13, "document": [6, 8, 10, 11, 12, 15, 16, 17, 18], "documentbuild": 18, "documentfil": [7, 12, 14, 15, 17], "doesn": 17, "don": [12, 18], "done": 9, "download": [6, 16], "downsiz": 8, "draw": 9, "drop": 6, "drop_last": 6, "dtype": [7, 8, 9, 10, 17], "dual": [4, 6], "dummi": 14, "dummy_img": 18, "dummy_input": 17, "dure": 1, "dutch": 6, "dynam": [6, 15], "dynamic_seq_length": 6, "e": [1, 2, 3, 7, 8], "each": [4, 6, 7, 8, 9, 10, 16, 18], "eas": 2, "easi": [4, 10, 14, 17], "easili": [7, 10, 12, 14, 16, 18], "econom": 1, "edit": 1, "educ": 1, "effect": 18, "effici": [2, 4, 6, 8], "either": [10, 18], "element": [6, 7, 8, 18], "els": [2, 15], "email": 1, "empathi": 1, "en": 18, "enabl": [6, 7], "enclos": 7, "encod": [4, 6, 7, 8, 18], "encode_sequ": 6, "encount": 2, "encrypt": 7, "end": [4, 6, 8, 10], "english": [6, 16], "enough": [2, 18], "ensur": 2, "entri": 6, "environ": [1, 13], "eo": 6, "equiv": 18, "estim": 8, "etc": [7, 15], "ethnic": 1, "evalu": [16, 18], "event": 1, "everyon": 1, "everyth": [2, 18], "exact": [10, 18], "exampl": [1, 2, 4, 6, 8, 14, 18], "exchang": 17, "execut": 18, "exist": 14, "expand": 9, "expect": [7, 9, 10], "experi": 1, "explan": [1, 18], "explicit": 1, "exploit": [4, 8], "export": [7, 8, 10, 11, 15, 18], "export_as_straight_box": [8, 18], "export_as_xml": 18, "export_model_to_onnx": 17, "express": [1, 9], "extens": 7, "extern": [1, 16], "extract": [4, 6], "extractor": 8, "f_": 10, "f_a": 10, "factor": 9, "fair": 1, "fairli": 1, "fals": [6, 7, 8, 9, 10, 12, 18], "faq": 1, "fascan": 14, "fast": [4, 6, 8], "fast_bas": [8, 18], "fast_smal": [8, 18], "fast_tini": [8, 18], "faster": [4, 8, 17], "fasterrcnn_mobilenet_v3_large_fpn": 8, "favorit": 18, "featur": [3, 8, 10, 11, 12, 15], "feedback": 1, "feel": [2, 14], "felix92": 14, "few": [17, 18], "figsiz": 10, "figur": [10, 15], "file": [2, 6], "final": 8, "find": [2, 16], "finnish": 6, "first": [2, 6], "firsthand": 6, "fit": [8, 18], "flag": 18, "flip": 9, "float": [7, 9, 10, 17], "float32": [7, 8, 9, 17], "fn": 9, "focu": 14, "focus": [1, 6], "folder": 6, "follow": [1, 2, 3, 6, 9, 10, 12, 13, 14, 15, 18], "font": 6, "font_famili": 6, "foral": 10, "forc": 2, "forg": 3, "form": [4, 6, 18], "format": [7, 10, 12, 16, 17, 18], "forpost": [4, 6], "forum": 2, "fp16": 17, "frac": 10, "framework": [3, 14, 16, 18], "free": [1, 2, 14], "french": [6, 12, 14, 18], "friendli": 4, "from": [1, 4, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18], "from_hub": [8, 14], "from_imag": [7, 14, 15, 17], "from_pdf": 7, "from_url": 7, "full": [6, 10, 18], "function": [6, 9, 10, 15], "funsd": [4, 6, 16, 18], "further": 16, "futur": 6, "g": [7, 8], "g_": 10, "g_x": 10, "gamma": 9, "gaussian": 9, "gaussianblur": 9, "gaussiannois": 9, "gen": 18, "gender": 1, "gener": [2, 4, 7, 8], "generic_cyrillic_lett": 6, "geometri": [4, 7, 18], "geq": 10, "german": [6, 12, 14], "get": [17, 18], "git": 14, "github": [2, 3, 8, 14], "give": [1, 15], "given": [6, 7, 9, 10, 18], "global": 8, "go": 18, "good": 17, "googl": 2, "googlevis": 4, "gpu": [4, 15, 17], "gracefulli": 1, "graph": [4, 6, 7], "grayscal": 9, "ground": 10, "groung": 10, "group": [4, 18], "gt": 10, "gt_box": 10, "gt_label": 10, "guid": 2, "guidanc": 16, "gvision": 18, "h": [7, 8, 9], "h_": 10, "ha": [2, 6, 10, 16], "handl": [11, 16, 18], "handwrit": 6, "handwritten": 16, "harass": 1, "hardwar": 18, "harm": 1, "hat": 10, "have": [1, 2, 10, 12, 14, 16, 17, 18], "head": [8, 18], "healthi": 1, "hebrew": 6, "height": [7, 9], "hello": [10, 18], "help": 17, "here": [5, 9, 11, 15, 16, 18], "hf": 8, "hf_hub_download": 8, "high": 7, "higher": [3, 6, 18], "hindi": 6, "hindi_digit": 6, "hocr": 18, "hook": 18, "horizont": [7, 9, 18], "hous": 6, "how": [2, 11, 12, 14, 16], "howev": 16, "hsv": 9, "html": [1, 2, 3, 7, 18], "http": [1, 3, 6, 7, 8, 14, 18], "hub": 8, "hue": 9, "huggingfac": 8, "hw": 6, "i": [1, 2, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17], "i7": 18, "ic03": [4, 6, 16], "ic13": [4, 6, 16], "icdar": [4, 6], "icdar2019": 6, "id": 18, "ident": 1, "identifi": 4, "iiit": [4, 6], "iiit5k": [6, 16], "iiithw": [4, 6, 16], "imag": [4, 6, 7, 8, 9, 10, 14, 15, 16, 18], "imagenet": 8, "imageri": 1, "images_90k_norm": 6, "img": [6, 9, 16, 17], "img_cont": 7, "img_fold": [6, 16], "img_path": 7, "img_transform": 6, "imgur5k": [4, 6, 16], "imgur5k_annot": 6, "imlist": 6, "impact": 1, "implement": [6, 7, 8, 9, 10, 18], "import": [6, 7, 8, 9, 10, 12, 14, 15, 16, 17, 18], "improv": 8, "inappropri": 1, "incid": 1, "includ": [1, 6, 16, 17], "inclus": 1, "increas": 9, "independ": 9, "index": [2, 7], "indic": 10, "individu": 1, "infer": [4, 8, 9, 15, 18], "inform": [1, 2, 4, 6, 16], "input": [2, 7, 8, 9, 17, 18], "input_crop": 8, "input_pag": [8, 10, 18], "input_shap": 17, "input_tensor": 8, "inspir": [1, 9], "instal": [14, 15, 17], "instanc": [1, 18], "instanti": [8, 18], "instead": [6, 7, 8], "insult": 1, "int": [6, 7, 9], "int64": 10, "integ": 10, "integr": [4, 14, 16], "intel": 18, "interact": [1, 7, 10], "interfac": [14, 17], "interoper": 17, "interpol": 9, "interpret": [6, 7], "intersect": 10, "invert": 9, "investig": 1, "invis": 1, "involv": [1, 18], "io": [12, 14, 15, 17], "iou": 10, "iou_thresh": 10, "iou_threshold": 15, "irregular": [4, 8, 16], "isn": 6, "issu": [1, 2, 14], "italian": 6, "iter": [6, 9, 16, 18], "its": [7, 8, 9, 10, 16, 18], "itself": [8, 14], "j": 10, "job": 2, "join": 2, "jpeg": 9, "jpegqual": 9, "jpg": [6, 7, 14, 17], "json": [6, 16, 18], "json_output": 18, "jump": 2, "just": 1, "kei": [4, 6], "kera": [8, 17], "kernel": [4, 8, 9], "kernel_shap": 9, "keywoard": 8, "keyword": [6, 7, 8, 10], "kie": [8, 12], "kie_predictor": [8, 12], "kiepredictor": 8, "kind": 1, "know": [2, 17], "kwarg": [6, 7, 8, 10], "l": 10, "l_j": 10, "label": [6, 10, 15, 16], "label_fil": [6, 16], "label_fold": 6, "label_path": [6, 16], "labels_path": [6, 16], "ladder": 1, "lambda": 9, "lambdatransform": 9, "lang": 18, "languag": [1, 4, 6, 7, 8, 14, 18], "larg": [8, 14], "largest": 10, "last": [3, 6], "latenc": 8, "later": 2, "latest": 18, "latin": 6, "layer": 17, "layout": 18, "lead": 1, "leader": 1, "learn": [1, 4, 8, 17, 18], "least": 3, "left": [10, 18], "legacy_french": 6, "length": [6, 18], "less": [17, 18], "level": [1, 6, 10, 18], "leverag": 11, "lf": 14, "librari": [2, 3, 11, 12], "light": 4, "lightweight": 17, "like": 1, "limits_": 10, "line": [4, 8, 10, 18], "line_1_1": 18, "link": 12, "linknet": [4, 8], "linknet_resnet18": [8, 12, 17, 18], "linknet_resnet34": [8, 17, 18], "linknet_resnet50": [8, 18], "list": [6, 7, 9, 10, 14], "ll": 10, "load": [4, 6, 8, 15, 17], "load_state_dict": 12, "load_weight": 12, "loc_pr": 18, "local": [2, 4, 6, 8, 10, 16, 18], "localis": 6, "localizationconfus": 10, "locat": [2, 7, 18], "login": 8, "login_to_hub": [8, 14], "logo": [7, 15, 16], "love": 14, "lower": [9, 10, 18], "m": [2, 10, 18], "m1": 3, "macbook": 3, "machin": 17, "made": 4, "magc_resnet31": 8, "mai": [1, 2], "mail": 1, "main": 11, "maintain": 4, "mainten": 2, "make": [1, 2, 10, 12, 13, 14, 17, 18], "mani": [16, 18], "manipul": 18, "map": [6, 8], "map_loc": 12, "master": [4, 8, 18], "match": [10, 18], "mathcal": 10, "matplotlib": [7, 10], "max": [6, 9, 10], "max_angl": 9, "max_area": 9, "max_char": [6, 16], "max_delta": 9, "max_gain": 9, "max_gamma": 9, "max_qual": 9, "max_ratio": 9, "maximum": [6, 9], "maxval": [8, 9], "mbox": 10, "mean": [9, 10, 12], "meaniou": 10, "meant": [7, 17], "measur": 18, "media": 1, "median": 8, "meet": 12, "member": 1, "memori": [13, 17], "mention": 18, "merg": 6, "messag": 2, "meta": 18, "metadata": 17, "metal": 3, "method": [7, 9, 18], "metric": [10, 18], "middl": 18, "might": [17, 18], "min": 9, "min_area": 9, "min_char": [6, 16], "min_gain": 9, "min_gamma": 9, "min_qual": 9, "min_ratio": 9, "min_val": 9, "minde": [1, 3, 4, 8], "minim": [2, 4], "minimalist": [4, 8], "minimum": [3, 6, 9, 10, 18], "minval": 9, "miss": 3, "mistak": 1, "mixed_float16": 17, "mixed_precis": 17, "mjsynth": [4, 6, 16], "mnt": 6, "mobilenet": [8, 14], "mobilenet_v3_larg": 8, "mobilenet_v3_large_r": 8, "mobilenet_v3_smal": [8, 12], "mobilenet_v3_small_crop_orient": [8, 12], "mobilenet_v3_small_page_orient": [8, 12], "mobilenet_v3_small_r": 8, "mobilenetv3": 8, "modal": [4, 6], "mode": 3, "model": [6, 10, 13, 15, 16], "model_nam": [8, 14, 17], "model_path": [15, 17], "moder": 1, "modif": 2, "modifi": [8, 13, 18], "modul": [3, 7, 8, 9, 10, 18], "more": [2, 16, 18], "most": 18, "mozilla": 1, "multi": [4, 8], "multilingu": [6, 14], "multipl": [6, 7, 9, 18], "multipli": 9, "multiprocess": 13, "my": 8, "my_awesome_model": 14, "my_hook": 18, "n": [6, 10], "name": [6, 8, 17, 18], "nation": 1, "natur": [1, 4, 6], "ndarrai": [6, 7, 9, 10], "necessari": [3, 12, 13], "need": [2, 3, 6, 10, 12, 13, 14, 15, 18], "neg": 9, "nest": 18, "network": [4, 6, 8, 17], "neural": [4, 6, 8, 17], "new": [2, 10], "next": [6, 16], "nois": 9, "noisi": [4, 6], "non": [4, 6, 7, 8, 9, 10], "none": [6, 7, 8, 9, 10, 18], "normal": [8, 9], "norwegian": 6, "note": [0, 2, 6, 8, 12, 14, 15, 17], "now": 2, "np": [8, 9, 10, 18], "num_output_channel": 9, "num_sampl": [6, 16], "number": [6, 9, 10, 18], "numpi": [7, 8, 10, 18], "o": 3, "obb": 15, "obj_detect": 14, "object": [6, 7, 10, 15, 18], "objectness_scor": [7, 18], "oblig": 1, "obtain": 18, "occupi": 17, "ocr": [4, 6, 8, 10, 14], "ocr_carea": 18, "ocr_db_crnn": 10, "ocr_lin": 18, "ocr_pag": 18, "ocr_par": 18, "ocr_predictor": [8, 12, 14, 17, 18], "ocrdataset": [6, 16], "ocrmetr": 10, "ocrpredictor": [8, 12], "ocrx_word": 18, "offens": 1, "offici": [1, 8], "offlin": 1, "offset": 9, "onc": 18, "one": [2, 6, 8, 9, 12, 14, 18], "oneof": 9, "ones": [6, 10], "onli": [2, 8, 9, 10, 12, 14, 16, 17, 18], "onlin": 1, "onnx": 15, "onnxruntim": [15, 17], "onnxtr": 17, "opac": 9, "opacity_rang": 9, "open": [1, 2, 14, 17], "opinion": 1, "optic": [4, 18], "optim": [4, 18], "option": [6, 8, 12], "order": [2, 6, 7, 9], "org": [1, 6, 8, 18], "organ": 7, "orient": [1, 7, 8, 11, 15, 18], "orientationpredictor": 8, "other": [1, 2], "otherwis": [1, 7, 10], "our": [2, 8, 18], "out": [2, 8, 9, 10, 18], "outpout": 18, "output": [7, 9, 17], "output_s": [7, 9], "outsid": 13, "over": [6, 10, 18], "overal": [1, 8], "overlai": 7, "overview": 15, "overwrit": 12, "overwritten": 14, "own": 4, "p": [9, 18], "packag": [2, 4, 10, 13, 15, 16, 17], "pad": [6, 8, 9, 18], "page": [3, 6, 8, 10, 12, 18], "page1": 7, "page2": 7, "page_1": 18, "page_idx": [7, 18], "page_orientation_predictor": [8, 12], "page_param": 12, "pair": 10, "paper": 8, "par_1_1": 18, "paragraph": 18, "paragraph_break": 18, "param": [9, 18], "paramet": [4, 7, 8, 17], "pars": [4, 6], "parseq": [4, 8, 14, 17, 18], "part": [6, 9, 18], "parti": 3, "partial": 18, "particip": 1, "pass": [6, 7, 8, 12, 18], "password": 7, "patch": [8, 10], "path": [6, 7, 15, 16, 17], "path_to_checkpoint": 12, "path_to_custom_model": 17, "path_to_pt": 12, "pattern": 1, "pdf": [7, 8, 11], "pdfpage": 7, "peopl": 1, "per": [9, 18], "perform": [4, 7, 8, 9, 10, 13, 17, 18], "period": 1, "permiss": 1, "permut": [4, 8], "persian_lett": 6, "person": [1, 16], "phase": 18, "photo": 16, "physic": [1, 7], "pick": 9, "pictur": 7, "pip": [2, 3, 15, 17], "pipelin": 18, "pixel": [7, 9, 18], "pleas": 2, "plot": 10, "plt": 10, "plug": 14, "plugin": 3, "png": 7, "point": 17, "polici": 13, "polish": 6, "polit": 1, "polygon": [6, 10, 18], "pool": 8, "portugues": 6, "posit": [1, 10], "possibl": [2, 10, 14, 18], "post": [1, 18], "postprocessor": 18, "potenti": 8, "power": 4, "ppageno": 18, "pre": [2, 8, 17], "precis": [10, 18], "pred": 10, "pred_box": 10, "pred_label": 10, "predefin": 16, "predict": [7, 8, 10, 18], "predictor": [4, 7, 8, 11, 12, 14, 17], "prefer": 16, "preinstal": 3, "preprocessor": [12, 18], "prerequisit": 14, "present": 11, "preserv": [8, 9, 18], "preserve_aspect_ratio": [7, 8, 9, 12, 18], "pretrain": [4, 8, 10, 12, 17, 18], "pretrained_backbon": [8, 12], "print": 18, "prior": 6, "privaci": 1, "privat": 1, "probabl": 9, "problem": 2, "procedur": 9, "process": [2, 4, 7, 12, 18], "processor": 18, "produc": [11, 18], "product": 17, "profession": 1, "project": [2, 16], "promptli": 1, "proper": 2, "properli": 6, "provid": [1, 2, 4, 14, 15, 16, 18], "public": [1, 4], "publicli": 18, "publish": 1, "pull": 14, "punctuat": 6, "pure": 6, "purpos": 2, "push_to_hf_hub": [8, 14], "py": 14, "pypdfium2": [3, 7], "pyplot": [7, 10], "python": [2, 15], "python3": 14, "pytorch": [3, 4, 8, 9, 12, 14, 17, 18], "q": 2, "qr": [7, 15], "qr_code": 16, "qualiti": 9, "question": 1, "quickli": 4, "quicktour": 11, "r": 18, "race": 1, "ramdisk": 6, "rand": [8, 9, 10, 17, 18], "random": [8, 9, 10, 18], "randomappli": 9, "randombright": 9, "randomcontrast": 9, "randomcrop": 9, "randomgamma": 9, "randomhorizontalflip": 9, "randomhu": 9, "randomjpegqu": 9, "randomli": 9, "randomres": 9, "randomrot": 9, "randomsatur": 9, "randomshadow": 9, "rang": 9, "rassi": 14, "ratio": [8, 9, 18], "raw": [7, 10], "re": 17, "read": [4, 6, 8], "read_html": 7, "read_img_as_numpi": 7, "read_img_as_tensor": 7, "read_pdf": 7, "readi": 17, "real": [4, 8, 9], "reason": [1, 4, 6], "rebuild": 2, "rebuilt": 2, "recal": [10, 18], "receipt": [4, 6, 18], "reco_arch": [8, 12, 14, 17], "reco_b": 18, "reco_model": [12, 14, 17], "reco_param": 12, "reco_predictor": 12, "recogn": 18, "recognit": [6, 10, 11, 12], "recognition_predictor": [8, 18], "recognition_task": [6, 16], "recognitiondataset": [6, 16], "recognitionpredictor": [8, 12], "rectangular": 8, "reduc": [3, 9], "refer": [2, 3, 12, 14, 15, 16, 18], "regardless": 1, "region": 18, "regroup": 10, "regular": 16, "reject": 1, "rel": [7, 9, 10, 18], "relat": 7, "releas": [0, 3], "relev": 15, "religion": 1, "remov": 1, "render": [7, 18], "repo": 8, "repo_id": [8, 14], "report": 1, "repositori": [6, 8, 14], "repres": [1, 17, 18], "represent": [4, 8], "request": [1, 14], "requir": [3, 9, 17], "research": 4, "residu": 8, "resiz": [9, 18], "resnet": 8, "resnet18": [8, 14], "resnet31": 8, "resnet34": 8, "resnet50": [8, 14], "resolv": 7, "resolve_block": 18, "resolve_lin": 18, "resourc": 16, "respect": 1, "rest": [2, 9, 10], "restrict": 13, "result": [2, 6, 7, 11, 14, 17, 18], "return": 18, "reusabl": 18, "review": 1, "rgb": [7, 9], "rgb_mode": 7, "rgb_output": 7, "right": [1, 8, 10], "robust": [4, 6], "root": 6, "rotat": [6, 7, 8, 9, 10, 11, 12, 16, 18], "run": [2, 3, 8], "same": [2, 7, 10, 16, 17, 18], "sampl": [6, 16, 18], "sample_transform": 6, "sar": [4, 8], "sar_resnet31": [8, 18], "satur": 9, "save": [8, 16], "scale": [7, 8, 9, 10], "scale_rang": 9, "scan": [4, 6], "scene": [4, 6, 8], "score": [7, 10], "script": [2, 16], "seamless": 4, "seamlessli": [4, 18], "search": 8, "searchabl": 11, "sec": 18, "second": 18, "section": [12, 14, 15, 17, 18], "secur": [1, 13], "see": [1, 2], "seen": 18, "segment": [4, 8, 18], "self": 18, "semant": [4, 8], "send": 18, "sens": 10, "sensit": 16, "separ": 18, "sequenc": [4, 6, 7, 8, 10, 18], "sequenti": [9, 18], "seri": 1, "seriou": 1, "set": [1, 3, 6, 8, 10, 13, 15, 18], "set_global_polici": 17, "sever": [7, 9, 18], "sex": 1, "sexual": 1, "shade": 9, "shape": [4, 7, 8, 9, 10, 18], "share": [13, 16], "shift": 9, "shm": 13, "should": [2, 6, 7, 9, 10], "show": [4, 7, 8, 10, 12, 14, 15], "showcas": [2, 11], "shuffl": [6, 9], "side": 10, "signatur": 7, "signific": 16, "simpl": [4, 8, 17], "simpler": 8, "sinc": [6, 16], "singl": [1, 2, 4, 6], "single_img_doc": 17, "size": [1, 6, 7, 9, 15, 18], "skew": 18, "slack": 2, "slightli": 8, "small": [2, 8, 18], "smallest": 7, "snapshot_download": 8, "snippet": 18, "so": [2, 3, 6, 8, 14, 16], "social": 1, "socio": 1, "some": [3, 11, 14, 16], "someth": 2, "somewher": 2, "sort": 1, "sourc": [6, 7, 8, 9, 10, 14], "space": [1, 18], "span": 18, "spanish": 6, "spatial": [4, 6, 7], "specif": [2, 3, 10, 12, 16, 18], "specifi": [1, 6, 7], "speed": [4, 8, 18], "sphinx": 2, "sroie": [4, 6, 16], "stabl": 3, "stackoverflow": 2, "stage": 4, "standalon": 11, "standard": 9, "start": 6, "state": [4, 10, 15], "static": 10, "statu": 1, "std": [9, 12], "step": 13, "still": 18, "str": [6, 7, 8, 9, 10], "straight": [6, 8, 16, 18], "straighten": 18, "straighten_pag": [8, 12, 18], "straigten_pag": 12, "stream": 7, "street": [4, 6], "strict": 3, "strictli": 10, "string": [6, 7, 10, 18], "strive": 3, "strong": [4, 8], "structur": [17, 18], "subset": [6, 18], "suggest": [2, 14], "sum": 10, "summari": 10, "support": [3, 12, 15, 17, 18], "sustain": 1, "svhn": [4, 6, 16], "svt": [6, 16], "swedish": 6, "symmetr": [8, 9, 18], "symmetric_pad": [8, 9, 18], "synthet": 4, "synthtext": [4, 6, 16], "system": 18, "t": [2, 6, 12, 17, 18], "tabl": [14, 15, 16], "take": [1, 6, 18], "target": [6, 7, 9, 10, 16], "target_s": 6, "task": [4, 6, 8, 14, 16, 18], "task2": 6, "team": 3, "techminde": 3, "templat": [2, 4], "tensor": [6, 7, 9, 18], "tensorflow": [3, 4, 7, 8, 9, 12, 14, 17, 18], "tensorspec": 17, "term": 1, "test": [6, 16], "test_set": 6, "text": [6, 7, 8, 10, 16], "text_output": 18, "textmatch": 10, "textnet": 8, "textnet_bas": 8, "textnet_smal": 8, "textnet_tini": 8, "textract": [4, 18], "textstylebrush": [4, 6], "textual": [4, 6, 7, 8, 18], "tf": [3, 7, 8, 9, 14, 17], "than": [2, 10, 14], "thank": 2, "thei": [1, 10], "them": [6, 18], "thi": [1, 2, 3, 5, 6, 9, 10, 12, 13, 14, 16, 17, 18], "thing": [17, 18], "third": 3, "those": [1, 7, 18], "threaten": 1, "threshold": 18, "through": [1, 9, 15, 16], "tilman": 14, "time": [1, 4, 8, 10, 16], "tini": 8, "titl": [7, 18], "tm": 18, "tmp": 13, "togeth": [2, 7], "tograi": 9, "tool": 16, "top": [10, 17, 18], "topic": 2, "torch": [3, 9, 12, 14, 17], "torchvis": 9, "total": 12, "toward": [1, 3], "train": [2, 6, 8, 9, 14, 15, 16, 17, 18], "train_it": [6, 16], "train_load": [6, 16], "train_pytorch": 14, "train_set": [6, 16], "train_tensorflow": 14, "trainabl": [4, 8], "tranform": 9, "transcrib": 18, "transfer": [4, 6], "transfo": 9, "transform": [4, 6, 8], "translat": 1, "troll": 1, "true": [6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18], "truth": 10, "tune": 17, "tupl": [6, 7, 9, 10], "two": [7, 13], "txt": 6, "type": [7, 10, 14, 17, 18], "typic": 18, "u": [1, 2], "ucsd": 6, "udac": 2, "uint8": [7, 8, 10, 18], "ukrainian": 6, "unaccept": 1, "underli": [16, 18], "underneath": 7, "understand": [4, 6, 18], "uniform": [8, 9], "uniformli": 9, "uninterrupt": [7, 18], "union": 10, "unittest": 2, "unlock": 7, "unoffici": 8, "unprofession": 1, "unsolicit": 1, "unsupervis": 4, "unwelcom": 1, "up": [8, 18], "updat": 10, "upgrad": 2, "upper": [6, 9], "uppercas": 16, "url": 7, "us": [1, 2, 3, 6, 8, 10, 11, 12, 13, 14, 15, 18], "usabl": 18, "usag": [13, 17], "use_polygon": [6, 10, 16], "useabl": 18, "user": [4, 7, 11], "utf": 18, "util": 17, "v1": 14, "v3": [8, 14, 18], "valid": 16, "valu": [2, 7, 9, 18], "valuabl": 4, "variabl": 13, "varieti": 6, "veri": 8, "version": [1, 2, 3, 17, 18], "vgg": 8, "vgg16": 14, "vgg16_bn_r": 8, "via": 1, "vietnames": 6, "view": [4, 6], "viewpoint": 1, "violat": 1, "visibl": 1, "vision": [4, 6, 8], "visiondataset": 6, "visiontransform": 8, "visual": [3, 4, 15], "visualize_pag": 10, "vit_": 8, "vit_b": 8, "vitstr": [4, 8, 17], "vitstr_bas": [8, 18], "vitstr_smal": [8, 12, 17, 18], "viz": 3, "vocab": [12, 14, 16, 17, 18], "vocabulari": [6, 12, 14], "w": [7, 8, 9, 10], "w3": 18, "wa": 1, "wai": [1, 4, 16], "want": [2, 17, 18], "warmup": 18, "wasn": 2, "we": [1, 2, 3, 4, 7, 9, 12, 14, 16, 17, 18], "weasyprint": 7, "web": [2, 7], "websit": 6, "welcom": 1, "well": [1, 17], "were": [1, 7, 18], "what": 1, "when": [1, 2, 8], "whenev": 2, "where": [2, 7, 9, 10], "whether": [2, 6, 7, 9, 10, 16, 18], "which": [1, 8, 13, 15, 16, 18], "whichev": 3, "while": [9, 18], "why": 1, "width": [7, 9], "wiki": 1, "wildreceipt": [4, 6, 16], "window": [8, 10], "wish": 2, "within": 1, "without": [1, 6, 8], "wonder": 2, "word": [4, 6, 8, 10, 18], "word_1_1": 18, "word_1_2": 18, "word_1_3": 18, "wordgener": [6, 16], "words_onli": 10, "work": [12, 13, 18], "workflow": 2, "worklow": 2, "world": [10, 18], "worth": 8, "wrap": 18, "wrapper": [6, 9], "write": 13, "written": [1, 7], "www": [1, 7, 18], "x": [7, 9, 10], "x_ascend": 18, "x_descend": 18, "x_i": 10, "x_size": 18, "x_wconf": 18, "xhtml": 18, "xmax": 7, "xmin": 7, "xml": 18, "xml_bytes_str": 18, "xml_element": 18, "xml_output": 18, "xmln": 18, "y": 10, "y_i": 10, "y_j": 10, "yet": 15, "ymax": 7, "ymin": 7, "yolov8": 15, "you": [2, 3, 6, 7, 8, 12, 13, 14, 15, 16, 17, 18], "your": [2, 4, 7, 10, 18], "yoursit": 7, "zero": [9, 10], "zoo": 12, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 6, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 6, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 6, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 6, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 6, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 6, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 6, "\u00e4\u00f6\u00e4\u00f6": 6, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 6, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 6, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 6, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 6, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 6, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 6, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0905": 6, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 6, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 6, "\u0950": 6, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 6, "\u09bd": 6, "\u09ce": 6, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 6}, "titles": ["Changelog", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 2, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 1], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 1], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 1], "31": 0, "4": [0, 1], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 18, "approach": 18, "architectur": 18, "arg": [6, 7, 8, 9, 10], "artefact": 7, "artefactdetect": 15, "attribut": 1, "avail": [15, 16, 18], "aw": 13, "ban": 1, "block": 7, "bug": 2, "changelog": 0, "choos": [16, 18], "classif": [8, 12, 14], "code": [1, 2], "codebas": 2, "commit": 2, "commun": 14, "compos": 9, "conda": 3, "conduct": 1, "connect": 2, "continu": 2, "contrib": 5, "contribut": [2, 5, 15], "contributor": 1, "convent": 14, "correct": 1, "coven": 1, "custom": [6, 12], "data": 16, "dataload": 6, "dataset": [4, 6, 16], "detect": [4, 8, 14, 16, 18], "develop": 2, "do": 18, "doctr": [2, 4, 5, 6, 7, 8, 9, 10, 11], "document": [2, 4, 7], "end": 18, "enforc": 1, "evalu": 10, "export": 17, "factori": 8, "featur": [2, 4], "feedback": 2, "file": 7, "from": 14, "gener": [6, 16], "git": 3, "guidelin": 1, "half": 17, "hub": 14, "huggingfac": 14, "i": 18, "infer": 17, "instal": [2, 3], "integr": [2, 15], "io": 7, "lambda": 13, "let": 2, "line": 7, "linux": 3, "load": [12, 14, 16], "loader": 6, "main": 4, "mode": 2, "model": [4, 8, 12, 14, 17, 18], "modifi": 2, "modul": [5, 15], "name": 14, "notebook": 11, "object": 16, "ocr": [16, 18], "onli": 3, "onnx": 17, "optim": 17, "option": 18, "orient": 12, "our": 1, "output": 18, "own": [12, 16], "packag": 3, "page": 7, "perman": 1, "pipelin": 15, "pledg": 1, "precis": 17, "predictor": 18, "prepar": 17, "prerequisit": 3, "pretrain": 14, "push": 14, "python": 3, "qualiti": 2, "question": 2, "read": 7, "readi": 16, "recognit": [4, 8, 14, 16, 18], "report": 2, "request": 2, "respons": 1, "return": [6, 7, 8, 10], "right": 18, "scope": 1, "share": 14, "should": 18, "stage": 18, "standard": 1, "structur": [2, 7], "style": 2, "support": [4, 5, 6, 9], "synthet": [6, 16], "task": 10, "temporari": 1, "test": 2, "text": [4, 18], "train": 12, "transform": 9, "two": 18, "unit": 2, "us": [16, 17], "util": 10, "v0": 0, "verif": 2, "via": 3, "visual": 10, "vocab": 6, "warn": 1, "what": 18, "word": 7, "your": [12, 14, 15, 16, 17], "zoo": [4, 8]}})
\ No newline at end of file
diff --git a/using_doctr/custom_models_training.html b/using_doctr/custom_models_training.html
index 580b4368b7..e664c6a950 100644
--- a/using_doctr/custom_models_training.html
+++ b/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -615,7 +615,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/using_doctr/running_on_aws.html b/using_doctr/running_on_aws.html
index ddb0c3c80f..81c38b49f5 100644
--- a/using_doctr/running_on_aws.html
+++ b/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -358,7 +358,7 @@ AWS Lambda
-
+
diff --git a/using_doctr/sharing_models.html b/using_doctr/sharing_models.html
index 07a3b2f2a3..4f5d1d68a5 100644
--- a/using_doctr/sharing_models.html
+++ b/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -540,7 +540,7 @@ Recognition
-
+
diff --git a/using_doctr/using_contrib_modules.html b/using_doctr/using_contrib_modules.html
index b4a10925e6..cf282ff3a4 100644
--- a/using_doctr/using_contrib_modules.html
+++ b/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -411,7 +411,7 @@ ArtefactDetection
-
+
diff --git a/using_doctr/using_datasets.html b/using_doctr/using_datasets.html
index 4a52df36ba..e30b6d6459 100644
--- a/using_doctr/using_datasets.html
+++ b/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -638,7 +638,7 @@ Data Loading
-
+
diff --git a/using_doctr/using_model_export.html b/using_doctr/using_model_export.html
index 2b30ee63a1..ad9d09ed4c 100644
--- a/using_doctr/using_model_export.html
+++ b/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -463,7 +463,7 @@ Using your ONNX exported model
-
+
diff --git a/using_doctr/using_models.html b/using_doctr/using_models.html
index 13cb06116b..5c80dbf62d 100644
--- a/using_doctr/using_models.html
+++ b/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1249,7 +1249,7 @@ Advanced options
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/cord.html b/v0.1.0/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.0/_modules/doctr/datasets/cord.html
+++ b/v0.1.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/detection.html b/v0.1.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.0/_modules/doctr/datasets/detection.html
+++ b/v0.1.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/doc_artefacts.html b/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/funsd.html b/v0.1.0/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.0/_modules/doctr/datasets/funsd.html
+++ b/v0.1.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ic03.html b/v0.1.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.0/_modules/doctr/datasets/ic03.html
+++ b/v0.1.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ic13.html b/v0.1.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.0/_modules/doctr/datasets/ic13.html
+++ b/v0.1.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/iiit5k.html b/v0.1.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/iiithws.html b/v0.1.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/imgur5k.html b/v0.1.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/loader.html b/v0.1.0/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.0/_modules/doctr/datasets/loader.html
+++ b/v0.1.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/mjsynth.html b/v0.1.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ocr.html b/v0.1.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.0/_modules/doctr/datasets/ocr.html
+++ b/v0.1.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/recognition.html b/v0.1.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.0/_modules/doctr/datasets/recognition.html
+++ b/v0.1.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/sroie.html b/v0.1.0/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.0/_modules/doctr/datasets/sroie.html
+++ b/v0.1.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/svhn.html b/v0.1.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.0/_modules/doctr/datasets/svhn.html
+++ b/v0.1.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/svt.html b/v0.1.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.0/_modules/doctr/datasets/svt.html
+++ b/v0.1.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/synthtext.html b/v0.1.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/utils.html b/v0.1.0/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.0/_modules/doctr/datasets/utils.html
+++ b/v0.1.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/wildreceipt.html b/v0.1.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.0/_modules/doctr/io/elements.html b/v0.1.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.0/_modules/doctr/io/elements.html
+++ b/v0.1.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.0/_modules/doctr/io/html.html b/v0.1.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.0/_modules/doctr/io/html.html
+++ b/v0.1.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.0/_modules/doctr/io/image/base.html b/v0.1.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.0/_modules/doctr/io/image/base.html
+++ b/v0.1.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.0/_modules/doctr/io/image/tensorflow.html b/v0.1.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/io/pdf.html b/v0.1.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.0/_modules/doctr/io/pdf.html
+++ b/v0.1.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.0/_modules/doctr/io/reader.html b/v0.1.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.0/_modules/doctr/io/reader.html
+++ b/v0.1.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/zoo.html b/v0.1.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/zoo.html b/v0.1.0/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/models/factory/hub.html b/v0.1.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.0/_modules/doctr/models/factory/hub.html
+++ b/v0.1.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/zoo.html b/v0.1.0/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/models/zoo.html b/v0.1.0/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.0/_modules/doctr/models/zoo.html
+++ b/v0.1.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/transforms/modules/base.html b/v0.1.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/utils/metrics.html b/v0.1.0/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.0/_modules/doctr/utils/metrics.html
+++ b/v0.1.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.0/_modules/doctr/utils/visualization.html b/v0.1.0/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.0/_modules/doctr/utils/visualization.html
+++ b/v0.1.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.0/_modules/index.html b/v0.1.0/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.0/_modules/index.html
+++ b/v0.1.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.0/_sources/getting_started/installing.rst.txt b/v0.1.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.0/_sources/getting_started/installing.rst.txt
+++ b/v0.1.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.0/_static/basic.css b/v0.1.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.0/_static/basic.css
+++ b/v0.1.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.0/_static/doctools.js b/v0.1.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.0/_static/doctools.js
+++ b/v0.1.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.0/_static/language_data.js b/v0.1.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.0/_static/language_data.js
+++ b/v0.1.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.0/_static/searchtools.js b/v0.1.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.0/_static/searchtools.js
+++ b/v0.1.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.0/changelog.html b/v0.1.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.0/changelog.html
+++ b/v0.1.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.0/community/resources.html b/v0.1.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.0/community/resources.html
+++ b/v0.1.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.0/contributing/code_of_conduct.html b/v0.1.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.0/contributing/code_of_conduct.html
+++ b/v0.1.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.0/contributing/contributing.html b/v0.1.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.0/contributing/contributing.html
+++ b/v0.1.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.0/genindex.html b/v0.1.0/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.0/genindex.html
+++ b/v0.1.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.0/getting_started/installing.html b/v0.1.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.0/getting_started/installing.html
+++ b/v0.1.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.0/index.html b/v0.1.0/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.0/index.html
+++ b/v0.1.0/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.0/modules/contrib.html b/v0.1.0/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.0/modules/contrib.html
+++ b/v0.1.0/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.0/modules/datasets.html b/v0.1.0/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.0/modules/datasets.html
+++ b/v0.1.0/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.0/modules/io.html b/v0.1.0/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.0/modules/io.html
+++ b/v0.1.0/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.0/modules/models.html b/v0.1.0/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.0/modules/models.html
+++ b/v0.1.0/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.0/modules/transforms.html b/v0.1.0/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.0/modules/transforms.html
+++ b/v0.1.0/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.0/modules/utils.html b/v0.1.0/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.0/modules/utils.html
+++ b/v0.1.0/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.0/notebooks.html b/v0.1.0/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.0/notebooks.html
+++ b/v0.1.0/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.0/search.html b/v0.1.0/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.0/search.html
+++ b/v0.1.0/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.0/searchindex.js b/v0.1.0/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.0/searchindex.js
+++ b/v0.1.0/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.0/using_doctr/custom_models_training.html b/v0.1.0/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.0/using_doctr/custom_models_training.html
+++ b/v0.1.0/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.0/using_doctr/running_on_aws.html b/v0.1.0/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.0/using_doctr/running_on_aws.html
+++ b/v0.1.0/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.0/using_doctr/sharing_models.html b/v0.1.0/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.0/using_doctr/sharing_models.html
+++ b/v0.1.0/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.0/using_doctr/using_contrib_modules.html b/v0.1.0/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.0/using_doctr/using_contrib_modules.html
+++ b/v0.1.0/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.0/using_doctr/using_datasets.html b/v0.1.0/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.0/using_doctr/using_datasets.html
+++ b/v0.1.0/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.0/using_doctr/using_model_export.html b/v0.1.0/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.0/using_doctr/using_model_export.html
+++ b/v0.1.0/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.0/using_doctr/using_models.html b/v0.1.0/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.0/using_doctr/using_models.html
+++ b/v0.1.0/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/cord.html b/v0.1.1/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.1/_modules/doctr/datasets/cord.html
+++ b/v0.1.1/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/detection.html b/v0.1.1/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.1/_modules/doctr/datasets/detection.html
+++ b/v0.1.1/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/funsd.html b/v0.1.1/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.1/_modules/doctr/datasets/funsd.html
+++ b/v0.1.1/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic03.html b/v0.1.1/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.1/_modules/doctr/datasets/ic03.html
+++ b/v0.1.1/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic13.html b/v0.1.1/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.1/_modules/doctr/datasets/ic13.html
+++ b/v0.1.1/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiit5k.html b/v0.1.1/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.1/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.1/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiithws.html b/v0.1.1/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.1/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.1/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/imgur5k.html b/v0.1.1/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.1/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.1/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/loader.html b/v0.1.1/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.1/_modules/doctr/datasets/loader.html
+++ b/v0.1.1/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/mjsynth.html b/v0.1.1/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.1/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.1/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ocr.html b/v0.1.1/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.1/_modules/doctr/datasets/ocr.html
+++ b/v0.1.1/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/recognition.html b/v0.1.1/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.1/_modules/doctr/datasets/recognition.html
+++ b/v0.1.1/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/sroie.html b/v0.1.1/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.1/_modules/doctr/datasets/sroie.html
+++ b/v0.1.1/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svhn.html b/v0.1.1/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.1/_modules/doctr/datasets/svhn.html
+++ b/v0.1.1/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svt.html b/v0.1.1/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.1/_modules/doctr/datasets/svt.html
+++ b/v0.1.1/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/synthtext.html b/v0.1.1/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.1/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.1/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/utils.html b/v0.1.1/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.1/_modules/doctr/datasets/utils.html
+++ b/v0.1.1/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/wildreceipt.html b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.1/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.1/_modules/doctr/io/elements.html b/v0.1.1/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.1/_modules/doctr/io/elements.html
+++ b/v0.1.1/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.1/_modules/doctr/io/html.html b/v0.1.1/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.1/_modules/doctr/io/html.html
+++ b/v0.1.1/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/base.html b/v0.1.1/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.1/_modules/doctr/io/image/base.html
+++ b/v0.1.1/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/tensorflow.html b/v0.1.1/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.1/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.1/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/io/pdf.html b/v0.1.1/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.1/_modules/doctr/io/pdf.html
+++ b/v0.1.1/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.1/_modules/doctr/io/reader.html b/v0.1.1/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.1/_modules/doctr/io/reader.html
+++ b/v0.1.1/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/zoo.html b/v0.1.1/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.1/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.1/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/zoo.html b/v0.1.1/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.1/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.1/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/factory/hub.html b/v0.1.1/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.1/_modules/doctr/models/factory/hub.html
+++ b/v0.1.1/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/zoo.html b/v0.1.1/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.1/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.1/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/zoo.html b/v0.1.1/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.1/_modules/doctr/models/zoo.html
+++ b/v0.1.1/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/base.html b/v0.1.1/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/utils/metrics.html b/v0.1.1/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.1/_modules/doctr/utils/metrics.html
+++ b/v0.1.1/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.1/_modules/doctr/utils/visualization.html b/v0.1.1/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.1/_modules/doctr/utils/visualization.html
+++ b/v0.1.1/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.1/_modules/index.html b/v0.1.1/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.1/_modules/index.html
+++ b/v0.1.1/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.1/_sources/getting_started/installing.rst.txt b/v0.1.1/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.1/_sources/getting_started/installing.rst.txt
+++ b/v0.1.1/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.1/_static/basic.css b/v0.1.1/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.1/_static/basic.css
+++ b/v0.1.1/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.1/_static/doctools.js b/v0.1.1/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.1/_static/doctools.js
+++ b/v0.1.1/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.1/_static/language_data.js b/v0.1.1/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.1/_static/language_data.js
+++ b/v0.1.1/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.1/_static/searchtools.js b/v0.1.1/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.1/_static/searchtools.js
+++ b/v0.1.1/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.1/changelog.html b/v0.1.1/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.1/changelog.html
+++ b/v0.1.1/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.1/community/resources.html b/v0.1.1/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.1/community/resources.html
+++ b/v0.1.1/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.1/contributing/code_of_conduct.html b/v0.1.1/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.1/contributing/code_of_conduct.html
+++ b/v0.1.1/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.1/contributing/contributing.html b/v0.1.1/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.1/contributing/contributing.html
+++ b/v0.1.1/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.1/genindex.html b/v0.1.1/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.1/genindex.html
+++ b/v0.1.1/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.1/getting_started/installing.html b/v0.1.1/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.1/getting_started/installing.html
+++ b/v0.1.1/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.1/index.html b/v0.1.1/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.1/index.html
+++ b/v0.1.1/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.1/modules/contrib.html b/v0.1.1/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.1/modules/contrib.html
+++ b/v0.1.1/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.1/modules/datasets.html b/v0.1.1/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.1/modules/datasets.html
+++ b/v0.1.1/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.1/modules/io.html b/v0.1.1/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.1/modules/io.html
+++ b/v0.1.1/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.1/modules/models.html b/v0.1.1/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.1/modules/models.html
+++ b/v0.1.1/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.1/modules/transforms.html b/v0.1.1/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.1/modules/transforms.html
+++ b/v0.1.1/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
Supported contribution modules
-
+
diff --git a/modules/datasets.html b/modules/datasets.html
index 0fe4b78d48..dfcacbc96e 100644
--- a/modules/datasets.html
+++ b/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1077,7 +1077,7 @@ Returns:
-
+
diff --git a/modules/io.html b/modules/io.html
index 924d292c59..77e9e017bf 100644
--- a/modules/io.html
+++ b/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -756,7 +756,7 @@ Returns:¶
-
+
diff --git a/modules/models.html b/modules/models.html
index bf45d11a71..f4a9833365 100644
--- a/modules/models.html
+++ b/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1598,7 +1598,7 @@ Args:¶
-
+
diff --git a/modules/transforms.html b/modules/transforms.html
index 6d77d16e7b..bc254c867b 100644
--- a/modules/transforms.html
+++ b/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -831,7 +831,7 @@ Args:¶<
-
+
diff --git a/modules/utils.html b/modules/utils.html
index 3dd3ecbd96..6784d81f6f 100644
--- a/modules/utils.html
+++ b/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -711,7 +711,7 @@ Args:¶
-
+
diff --git a/notebooks.html b/notebooks.html
index f3ea994e49..647f73d4eb 100644
--- a/notebooks.html
+++ b/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -387,7 +387,7 @@ docTR Notebooks
-
+
diff --git a/search.html b/search.html
index f0693e2c97..0e0da5efb3 100644
--- a/search.html
+++ b/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -336,7 +336,7 @@
-
+
diff --git a/searchindex.js b/searchindex.js
index 8598997441..df18967072 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[1, "correction"]], "2. Warning": [[1, "warning"]], "3. Temporary Ban": [[1, "temporary-ban"]], "4. Permanent Ban": [[1, "permanent-ban"]], "AWS Lambda": [[13, null]], "Advanced options": [[18, "advanced-options"]], "Args:": [[6, "args"], [6, "id4"], [6, "id7"], [6, "id10"], [6, "id13"], [6, "id16"], [6, "id19"], [6, "id22"], [6, "id25"], [6, "id29"], [6, "id32"], [6, "id37"], [6, "id40"], [6, "id46"], [6, "id49"], [6, "id50"], [6, "id51"], [6, "id54"], [6, "id57"], [6, "id60"], [6, "id61"], [7, "args"], [7, "id2"], [7, "id3"], [7, "id4"], [7, "id5"], [7, "id6"], [7, "id7"], [7, "id10"], [7, "id12"], [7, "id14"], [7, "id16"], [7, "id20"], [7, "id24"], [7, "id28"], [8, "args"], [8, "id3"], [8, "id8"], [8, "id13"], [8, "id17"], [8, "id21"], [8, "id26"], [8, "id31"], [8, "id36"], [8, "id41"], [8, "id46"], [8, "id50"], [8, "id54"], [8, "id59"], [8, "id63"], [8, "id68"], [8, "id73"], [8, "id77"], [8, "id81"], [8, "id85"], [8, "id90"], [8, "id95"], [8, "id99"], [8, "id104"], [8, "id109"], [8, "id114"], [8, "id119"], [8, "id123"], [8, "id127"], [8, "id132"], [8, "id137"], [8, "id142"], [8, "id146"], [8, "id150"], [8, "id155"], [8, "id159"], [8, "id163"], [8, "id167"], [8, "id169"], [8, "id171"], [8, "id173"], [9, "args"], [9, "id1"], [9, "id2"], [9, "id3"], [9, "id4"], [9, "id5"], [9, "id6"], [9, "id7"], [9, "id8"], [9, "id9"], [9, "id10"], [9, "id11"], [9, "id12"], [9, "id13"], [9, "id14"], [9, "id15"], [9, "id16"], [9, "id17"], [9, "id18"], [9, "id19"], [10, "args"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"]], "Artefact": [[7, "artefact"]], "ArtefactDetection": [[15, "artefactdetection"]], "Attribution": [[1, "attribution"]], "Available Datasets": [[16, "available-datasets"]], "Available architectures": [[18, "available-architectures"], [18, "id1"], [18, "id2"]], "Available contribution modules": [[15, "available-contribution-modules"]], "Block": [[7, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[16, null]], "Choosing the right model": [[18, null]], "Classification": [[14, "classification"]], "Code quality": [[2, "code-quality"]], "Code style verification": [[2, "code-style-verification"]], "Codebase structure": [[2, "codebase-structure"]], "Commits": [[2, "commits"]], "Composing transformations": [[9, "composing-transformations"]], "Continuous Integration": [[2, "continuous-integration"]], "Contributing to docTR": [[2, null]], "Contributor Covenant Code of Conduct": [[1, null]], "Custom dataset loader": [[6, "custom-dataset-loader"]], "Custom orientation classification models": [[12, "custom-orientation-classification-models"]], "Data Loading": [[16, "data-loading"]], "Dataloader": [[6, "dataloader"]], "Detection": [[14, "detection"], [16, "detection"]], "Detection predictors": [[18, "detection-predictors"]], "Developer mode installation": [[2, "developer-mode-installation"]], "Developing docTR": [[2, "developing-doctr"]], "Document": [[7, "document"]], "Document structure": [[7, "document-structure"]], "End-to-End OCR": [[18, "end-to-end-ocr"]], "Enforcement": [[1, "enforcement"]], "Enforcement Guidelines": [[1, "enforcement-guidelines"]], "Enforcement Responsibilities": [[1, "enforcement-responsibilities"]], "Export to ONNX": [[17, "export-to-onnx"]], "Feature requests & bug report": [[2, "feature-requests-bug-report"]], "Feedback": [[2, "feedback"]], "File reading": [[7, "file-reading"]], "Half-precision": [[17, "half-precision"]], "Installation": [[3, null]], "Integrate contributions into your pipeline": [[15, null]], "Let\u2019s connect": [[2, "let-s-connect"]], "Line": [[7, "line"]], "Loading from Huggingface Hub": [[14, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[12, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[12, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[4, "main-features"]], "Model optimization": [[17, "model-optimization"]], "Model zoo": [[4, "model-zoo"]], "Modifying the documentation": [[2, "modifying-the-documentation"]], "Naming conventions": [[14, "naming-conventions"]], "OCR": [[16, "ocr"]], "Object Detection": [[16, "object-detection"]], "Our Pledge": [[1, "our-pledge"]], "Our Standards": [[1, "our-standards"]], "Page": [[7, "page"]], "Preparing your model for inference": [[17, null]], "Prerequisites": [[3, "prerequisites"]], "Pretrained community models": [[14, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[14, "pushing-to-the-huggingface-hub"]], "Questions": [[2, "questions"]], "Recognition": [[14, "recognition"], [16, "recognition"]], "Recognition predictors": [[18, "recognition-predictors"]], "Returns:": [[6, "returns"], [7, "returns"], [7, "id11"], [7, "id13"], [7, "id15"], [7, "id19"], [7, "id23"], [7, "id27"], [7, "id31"], [8, "returns"], [8, "id6"], [8, "id11"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id29"], [8, "id34"], [8, "id39"], [8, "id44"], [8, "id49"], [8, "id53"], [8, "id57"], [8, "id62"], [8, "id66"], [8, "id71"], [8, "id76"], [8, "id80"], [8, "id84"], [8, "id88"], [8, "id93"], [8, "id98"], [8, "id102"], [8, "id107"], [8, "id112"], [8, "id117"], [8, "id122"], [8, "id126"], [8, "id130"], [8, "id135"], [8, "id140"], [8, "id145"], [8, "id149"], [8, "id153"], [8, "id158"], [8, "id162"], [8, "id166"], [8, "id168"], [8, "id170"], [8, "id172"], [10, "returns"]], "Scope": [[1, "scope"]], "Share your model with the community": [[14, null]], "Supported Vocabs": [[6, "supported-vocabs"]], "Supported contribution modules": [[5, "supported-contribution-modules"]], "Supported datasets": [[4, "supported-datasets"]], "Supported transformations": [[9, "supported-transformations"]], "Synthetic dataset generator": [[6, "synthetic-dataset-generator"], [16, "synthetic-dataset-generator"]], "Task evaluation": [[10, "task-evaluation"]], "Text Detection": [[18, "text-detection"]], "Text Recognition": [[18, "text-recognition"]], "Text detection models": [[4, "text-detection-models"]], "Text recognition models": [[4, "text-recognition-models"]], "Train your own model": [[12, null]], "Two-stage approaches": [[18, "two-stage-approaches"]], "Unit tests": [[2, "unit-tests"]], "Use your own datasets": [[16, "use-your-own-datasets"]], "Using your ONNX exported model": [[17, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[3, "via-conda-only-for-linux"]], "Via Git": [[3, "via-git"]], "Via Python Package": [[3, "via-python-package"]], "Visualization": [[10, "visualization"]], "What should I do with the output?": [[18, "what-should-i-do-with-the-output"]], "Word": [[7, "word"]], "docTR Notebooks": [[11, null]], "docTR Vocabs": [[6, "id62"]], "docTR: Document Text Recognition": [[4, null]], "doctr.contrib": [[5, null]], "doctr.datasets": [[6, null], [6, "datasets"]], "doctr.io": [[7, null]], "doctr.models": [[8, null]], "doctr.models.classification": [[8, "doctr-models-classification"]], "doctr.models.detection": [[8, "doctr-models-detection"]], "doctr.models.factory": [[8, "doctr-models-factory"]], "doctr.models.recognition": [[8, "doctr-models-recognition"]], "doctr.models.zoo": [[8, "doctr-models-zoo"]], "doctr.transforms": [[9, null]], "doctr.utils": [[10, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[7, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[7, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[9, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[6, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[9, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[9, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[6, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[6, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[8, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[6, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[6, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[7, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[7, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[6, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[6, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[9, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[9, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[6, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[6, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[6, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[6, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[6, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[8, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[9, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[7, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[6, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[9, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[8, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[6, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[9, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[7, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[9, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[9, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[9, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[9, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[9, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[9, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[9, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[9, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[9, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[9, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[9, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[9, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[7, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[7, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[7, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[6, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[9, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[7, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[7, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[6, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[6, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[6, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[6, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[9, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[10, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[6, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[7, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[6, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[6, 0, 1, "", "CORD"], [6, 0, 1, "", "CharacterGenerator"], [6, 0, 1, "", "DetectionDataset"], [6, 0, 1, "", "DocArtefacts"], [6, 0, 1, "", "FUNSD"], [6, 0, 1, "", "IC03"], [6, 0, 1, "", "IC13"], [6, 0, 1, "", "IIIT5K"], [6, 0, 1, "", "IIITHWS"], [6, 0, 1, "", "IMGUR5K"], [6, 0, 1, "", "MJSynth"], [6, 0, 1, "", "OCRDataset"], [6, 0, 1, "", "RecognitionDataset"], [6, 0, 1, "", "SROIE"], [6, 0, 1, "", "SVHN"], [6, 0, 1, "", "SVT"], [6, 0, 1, "", "SynthText"], [6, 0, 1, "", "WILDRECEIPT"], [6, 0, 1, "", "WordGenerator"], [6, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[6, 0, 1, "", "DataLoader"]], "doctr.io": [[7, 0, 1, "", "Artefact"], [7, 0, 1, "", "Block"], [7, 0, 1, "", "Document"], [7, 0, 1, "", "DocumentFile"], [7, 0, 1, "", "Line"], [7, 0, 1, "", "Page"], [7, 0, 1, "", "Word"], [7, 1, 1, "", "decode_img_as_tensor"], [7, 1, 1, "", "read_html"], [7, 1, 1, "", "read_img_as_numpy"], [7, 1, 1, "", "read_img_as_tensor"], [7, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[7, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[7, 2, 1, "", "from_images"], [7, 2, 1, "", "from_pdf"], [7, 2, 1, "", "from_url"]], "doctr.io.Page": [[7, 2, 1, "", "show"]], "doctr.models": [[8, 1, 1, "", "kie_predictor"], [8, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[8, 1, 1, "", "crop_orientation_predictor"], [8, 1, 1, "", "magc_resnet31"], [8, 1, 1, "", "mobilenet_v3_large"], [8, 1, 1, "", "mobilenet_v3_large_r"], [8, 1, 1, "", "mobilenet_v3_small"], [8, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [8, 1, 1, "", "mobilenet_v3_small_page_orientation"], [8, 1, 1, "", "mobilenet_v3_small_r"], [8, 1, 1, "", "page_orientation_predictor"], [8, 1, 1, "", "resnet18"], [8, 1, 1, "", "resnet31"], [8, 1, 1, "", "resnet34"], [8, 1, 1, "", "resnet50"], [8, 1, 1, "", "textnet_base"], [8, 1, 1, "", "textnet_small"], [8, 1, 1, "", "textnet_tiny"], [8, 1, 1, "", "vgg16_bn_r"], [8, 1, 1, "", "vit_b"], [8, 1, 1, "", "vit_s"]], "doctr.models.detection": [[8, 1, 1, "", "db_mobilenet_v3_large"], [8, 1, 1, "", "db_resnet50"], [8, 1, 1, "", "detection_predictor"], [8, 1, 1, "", "fast_base"], [8, 1, 1, "", "fast_small"], [8, 1, 1, "", "fast_tiny"], [8, 1, 1, "", "linknet_resnet18"], [8, 1, 1, "", "linknet_resnet34"], [8, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[8, 1, 1, "", "from_hub"], [8, 1, 1, "", "login_to_hub"], [8, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[8, 1, 1, "", "crnn_mobilenet_v3_large"], [8, 1, 1, "", "crnn_mobilenet_v3_small"], [8, 1, 1, "", "crnn_vgg16_bn"], [8, 1, 1, "", "master"], [8, 1, 1, "", "parseq"], [8, 1, 1, "", "recognition_predictor"], [8, 1, 1, "", "sar_resnet31"], [8, 1, 1, "", "vitstr_base"], [8, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[9, 0, 1, "", "ChannelShuffle"], [9, 0, 1, "", "ColorInversion"], [9, 0, 1, "", "Compose"], [9, 0, 1, "", "GaussianBlur"], [9, 0, 1, "", "GaussianNoise"], [9, 0, 1, "", "LambdaTransformation"], [9, 0, 1, "", "Normalize"], [9, 0, 1, "", "OneOf"], [9, 0, 1, "", "RandomApply"], [9, 0, 1, "", "RandomBrightness"], [9, 0, 1, "", "RandomContrast"], [9, 0, 1, "", "RandomCrop"], [9, 0, 1, "", "RandomGamma"], [9, 0, 1, "", "RandomHorizontalFlip"], [9, 0, 1, "", "RandomHue"], [9, 0, 1, "", "RandomJpegQuality"], [9, 0, 1, "", "RandomResize"], [9, 0, 1, "", "RandomRotate"], [9, 0, 1, "", "RandomSaturation"], [9, 0, 1, "", "RandomShadow"], [9, 0, 1, "", "Resize"], [9, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[10, 0, 1, "", "DetectionMetric"], [10, 0, 1, "", "LocalizationConfusion"], [10, 0, 1, "", "OCRMetric"], [10, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.visualization": [[10, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [1, 7, 8, 10, 14, 17], "0": [1, 3, 6, 9, 10, 12, 15, 16, 18], "00": 18, "01": 18, "0123456789": 6, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "02562": 8, "03": 18, "035": 18, "0361328125": 18, "04": 18, "05": 18, "06": 18, "06640625": 18, "07": 18, "08": [9, 18], "09": 18, "0966796875": 18, "1": [6, 7, 8, 9, 10, 12, 16, 18], "10": [6, 10, 18], "100": [6, 9, 10, 16, 18], "1000": 18, "101": 6, "1024": [8, 12, 18], "104": 6, "106": 6, "108": 6, "1095": 16, "11": 18, "110": 10, "1107": 16, "114": 6, "115": 6, "1156": 16, "116": 6, "118": 6, "11800h": 18, "11th": 18, "12": 18, "120": 6, "123": 6, "126": 6, "1268": 16, "128": [8, 12, 17, 18], "13": 18, "130": 6, "13068": 16, "131": 6, "1337891": 16, "1357421875": 18, "1396484375": 18, "14": 18, "1420": 18, "14470v1": 6, "149": 16, "15": 18, "150": [10, 18], "1552": 18, "16": [8, 17, 18], "1630859375": 18, "1684": 18, "16x16": 8, "17": 18, "1778": 18, "1782": 18, "18": [8, 18], "185546875": 18, "1900": 18, "1910": 8, "19342": 16, "19370": 16, "195": 6, "19598": 16, "199": 18, "1999": 18, "2": [3, 4, 6, 7, 9, 15, 18], "20": 18, "200": 10, "2000": 16, "2003": [4, 6], "2012": 6, "2013": [4, 6], "2015": 6, "2019": 4, "207901": 16, "21": 18, "2103": 6, "2186": 16, "21888": 16, "22": 18, "224": [8, 9], "225": 9, "22672": 16, "229": [9, 16], "23": 18, "233": 16, "236": 6, "24": 18, "246": 16, "249": 16, "25": 18, "2504": 18, "255": [7, 8, 9, 10, 18], "256": 8, "257": 16, "26": 18, "26032": 16, "264": 12, "27": 18, "2700": 16, "2710": 18, "2749": 12, "28": 18, "287": 12, "29": 18, "296": 12, "299": 12, "2d": 18, "3": [3, 4, 7, 8, 9, 10, 17, 18], "30": 18, "300": 16, "3000": 16, "301": 12, "30595": 18, "30ghz": 18, "31": 8, "32": [6, 8, 9, 12, 16, 17, 18], "3232421875": 18, "33": [9, 18], "33402": 16, "33608": 16, "34": [8, 18], "340": 18, "3456": 18, "3515625": 18, "36": 18, "360": 16, "37": [6, 18], "38": 18, "39": 18, "4": [8, 9, 10, 18], "40": 18, "406": 9, "41": 18, "42": 18, "43": 18, "44": 18, "45": 18, "456": 9, "46": 18, "47": 18, "472": 16, "48": [6, 18], "485": 9, "49": 18, "49377": 16, "5": [6, 9, 10, 15, 18], "50": [8, 16, 18], "51": 18, "51171875": 18, "512": 8, "52": [6, 18], "529": 18, "53": 18, "54": 18, "540": 18, "5478515625": 18, "55": 18, "56": 18, "57": 18, "58": [6, 18], "580": 18, "5810546875": 18, "583": 18, "59": 18, "597": 18, "5k": [4, 6], "5m": 18, "6": [9, 18], "60": 9, "600": [8, 10, 18], "61": 18, "62": 18, "626": 16, "63": 18, "64": [8, 9, 18], "641": 18, "647": 16, "65": 18, "66": 18, "67": 18, "68": 18, "69": 18, "693": 12, "694": 12, "695": 12, "6m": 18, "7": 18, "70": [6, 10, 18], "707470": 16, "71": [6, 18], "7100000": 16, "7141797": 16, "7149": 16, "72": 18, "72dpi": 7, "73": 18, "73257": 16, "74": 18, "75": [9, 18], "7581382": 16, "76": 18, "77": 18, "772": 12, "772875": 16, "78": 18, "785": 12, "79": 18, "793533": 16, "796": 16, "798": 12, "7m": 18, "8": [8, 9, 18], "80": 18, "800": [8, 10, 16, 18], "81": 18, "82": 18, "83": 18, "84": 18, "849": 16, "85": 18, "8564453125": 18, "857": 18, "85875": 16, "86": 18, "8603515625": 18, "87": 18, "8707": 16, "88": 18, "89": 18, "9": [3, 9, 18], "90": 18, "90k": 6, "90kdict32px": 6, "91": 18, "914085328578949": 18, "92": 18, "93": 18, "94": [6, 18], "95": [10, 18], "9578408598899841": 18, "96": 18, "97": 18, "98": 18, "99": 18, "9949972033500671": 18, "A": [1, 2, 4, 6, 7, 8, 11, 17], "As": 2, "Be": 18, "Being": 1, "By": 13, "For": [1, 2, 3, 12, 18], "If": [2, 7, 8, 12, 18], "In": [2, 6, 16], "It": [9, 14, 15, 17], "Its": [4, 8], "No": [1, 18], "Of": 6, "Or": [15, 17], "The": [1, 2, 6, 7, 10, 13, 15, 16, 17, 18], "Then": 8, "To": [2, 3, 13, 14, 15, 17, 18], "_": [1, 6, 8], "__call__": 18, "_build": 2, "_i": 10, "ab": 6, "abc": 17, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "abdef": [6, 16], "abl": [16, 18], "about": [1, 16, 18], "abov": 18, "abstractdataset": 6, "abus": 1, "accept": 1, "access": [4, 7, 16, 18], "account": [1, 14], "accur": 18, "accuraci": 10, "achiev": 17, "act": 1, "action": 1, "activ": 4, "ad": [2, 8, 9], "adapt": 1, "add": [9, 10, 14, 18], "add_hook": 18, "add_label": 10, "addit": [2, 3, 7, 15, 18], "addition": [2, 18], "address": [1, 7], "adjust": 9, "advanc": 1, "advantag": 17, "advis": 2, "aesthet": [4, 6], "affect": 1, "after": [14, 18], "ag": 1, "again": 8, "aggreg": [10, 16], "aggress": 1, "align": [1, 7, 9], "all": [1, 2, 5, 6, 7, 9, 10, 15, 16, 18], "allow": [1, 17], "along": 18, "alreadi": [2, 17], "also": [1, 8, 14, 15, 16, 18], "alwai": 16, "an": [1, 2, 4, 6, 7, 8, 10, 15, 17, 18], "analysi": [7, 15], "ancient_greek": 6, "angl": [7, 9], "ani": [1, 6, 7, 8, 9, 10, 17, 18], "annot": 6, "anot": 16, "anoth": [8, 12, 16], "answer": 1, "anyascii": 10, "anyon": 4, "anyth": 15, "api": [2, 4], "apolog": 1, "apologi": 1, "app": 2, "appear": 1, "appli": [1, 6, 9], "applic": [4, 8], "appoint": 1, "appreci": 14, "appropri": [1, 2, 18], "ar": [1, 2, 3, 5, 6, 7, 9, 10, 11, 15, 16, 18], "arab": 6, "arabic_diacrit": 6, "arabic_lett": 6, "arabic_punctu": 6, "arbitrarili": [4, 8], "arch": [8, 14], "architectur": [4, 8, 14, 15], "area": 18, "argument": [6, 7, 8, 10, 12, 18], "around": 1, "arrai": [7, 9, 10], "art": [4, 15], "artefact": [10, 15, 18], "artefact_typ": 7, "artifici": [4, 6], "arxiv": [6, 8], "asarrai": 10, "ascii_lett": 6, "aspect": [4, 8, 9, 18], "assess": 10, "assign": 10, "associ": 7, "assum": 8, "assume_straight_pag": [8, 12, 18], "astyp": [8, 10, 18], "attack": 1, "attend": [4, 8], "attent": [1, 8], "autom": 4, "automat": 18, "autoregress": [4, 8], "avail": [1, 4, 5, 9], "averag": [9, 18], "avoid": [1, 3], "aw": [4, 18], "awar": 18, "azur": 18, "b": [8, 10, 18], "b_j": 10, "back": 2, "backbon": 8, "backend": 18, "background": 16, "bangla": 6, "bar": 15, "bar_cod": 16, "base": [4, 8, 15], "baselin": [4, 8, 18], "batch": [6, 8, 9, 15, 16, 18], "batch_siz": [6, 12, 15, 16, 17], "bblanchon": 3, "bbox": 18, "becaus": 13, "been": [2, 10, 16, 18], "befor": [6, 8, 9, 18], "begin": 10, "behavior": [1, 18], "being": [10, 18], "belong": 18, "benchmark": 18, "best": 1, "better": [11, 18], "between": [9, 10, 18], "bgr": 7, "bilinear": 9, "bin_thresh": 18, "binar": [4, 8, 18], "binari": [7, 17, 18], "bit": 17, "block": [10, 18], "block_1_1": 18, "blur": 9, "bmvc": 6, "bn": 14, "bodi": [1, 18], "bool": [6, 7, 8, 9, 10], "boolean": [8, 18], "both": [4, 6, 9, 16, 18], "bottom": [8, 18], "bound": [6, 7, 8, 9, 10, 15, 16, 18], "box": [6, 7, 8, 9, 10, 15, 16, 18], "box_thresh": 18, "bright": 9, "browser": [2, 4], "build": [2, 3, 17], "built": 2, "byte": [7, 18], "c": [3, 7, 10], "c_j": 10, "cach": [2, 6, 13], "cache_sampl": 6, "call": 17, "callabl": [6, 9], "can": [2, 3, 12, 13, 14, 15, 16, 18], "capabl": [2, 11, 18], "case": [6, 10], "cf": 18, "cfg": 18, "challeng": 6, "challenge2_test_task12_imag": 6, "challenge2_test_task1_gt": 6, "challenge2_training_task12_imag": 6, "challenge2_training_task1_gt": 6, "chang": [13, 18], "channel": [1, 2, 7, 9], "channel_prior": 3, "channelshuffl": 9, "charact": [4, 6, 7, 10, 16, 18], "charactergener": [6, 16], "characterist": 1, "charg": 18, "charset": 18, "chart": 7, "check": [2, 14, 18], "checkpoint": 8, "chip": 3, "ci": 2, "clarifi": 1, "clariti": 1, "class": [1, 6, 7, 9, 10, 18], "class_nam": 12, "classif": [16, 18], "classmethod": 7, "clear": 2, "clone": 3, "close": 2, "co": 14, "code": [4, 7, 15], "codecov": 2, "colab": 11, "collate_fn": 6, "collect": [7, 15], "color": 9, "colorinvers": 9, "column": 7, "com": [1, 3, 7, 8, 14], "combin": 18, "command": [2, 15], "comment": 1, "commit": 1, "common": [1, 9, 10, 17], "commun": 1, "compar": 4, "comparison": [10, 18], "competit": 6, "compil": [11, 18], "complaint": 1, "complementari": 10, "complet": 2, "compon": 18, "compos": [6, 18], "comprehens": 18, "comput": [6, 10, 17, 18], "conf_threshold": 15, "confid": [7, 18], "config": [3, 8], "configur": 8, "confus": 10, "consecut": [9, 18], "consequ": 1, "consid": [1, 2, 6, 7, 10, 18], "consist": 18, "consolid": [4, 6], "constant": 9, "construct": 1, "contact": 1, "contain": [5, 6, 11, 16, 18], "content": [6, 7, 18], "context": 8, "contib": 3, "continu": 1, "contrast": 9, "contrast_factor": 9, "contrib": [3, 15], "contribut": 1, "contributor": 2, "convers": 7, "convert": [7, 9], "convolut": 8, "coordin": [7, 18], "cord": [4, 6, 16, 18], "core": [10, 18], "corner": 18, "correct": 9, "correspond": [3, 7, 9, 18], "could": [1, 15], "counterpart": 10, "cover": 2, "coverag": 2, "cpu": [4, 12, 17], "creat": 14, "crnn": [4, 8, 14], "crnn_mobilenet_v3_larg": [8, 14, 18], "crnn_mobilenet_v3_smal": [8, 17, 18], "crnn_vgg16_bn": [8, 12, 14, 18], "crop": [7, 8, 9, 12, 16, 18], "crop_orient": [7, 18], "crop_orientation_predictor": [8, 12], "crop_param": 12, "cuda": 17, "currenc": 6, "current": [2, 12, 18], "custom": [14, 15, 17, 18], "custom_crop_orientation_model": 12, "custom_page_orientation_model": 12, "customhook": 18, "cvit": 4, "czczup": 8, "czech": 6, "d": [6, 16], "danish": 6, "data": [4, 6, 7, 9, 10, 12, 14], "dataload": 16, "dataset": [8, 12, 18], "dataset_info": 6, "date": [12, 18], "db": 14, "db_mobilenet_v3_larg": [8, 14, 18], "db_resnet34": 18, "db_resnet50": [8, 12, 14, 18], "dbnet": [4, 8], "deal": [11, 18], "decis": 1, "decod": 7, "decode_img_as_tensor": 7, "dedic": 17, "deem": 1, "deep": [8, 18], "def": 18, "default": [3, 7, 12, 13, 18], "defer": 16, "defin": [10, 17], "degre": [7, 9, 18], "degress": 7, "delet": 2, "delimit": 18, "delta": 9, "demo": [2, 4], "demonstr": 1, "depend": [2, 3, 4, 18], "deploi": 2, "deploy": 4, "derogatori": 1, "describ": 8, "descript": 11, "design": 9, "desir": 7, "det_arch": [8, 12, 14, 17], "det_b": 18, "det_model": [12, 14, 17], "det_param": 12, "det_predictor": [12, 18], "detail": [12, 18], "detect": [6, 7, 10, 11, 12, 15], "detect_languag": 8, "detect_orient": [8, 12, 18], "detection_predictor": [8, 18], "detection_task": [6, 16], "detectiondataset": [6, 16], "detectionmetr": 10, "detectionpredictor": [8, 12], "detector": [4, 8, 15], "deterior": 8, "determin": 1, "dev": [2, 13], "develop": 3, "deviat": 9, "devic": 17, "dict": [7, 10, 18], "dictionari": [7, 10], "differ": 1, "differenti": [4, 8], "digit": [4, 6, 16], "dimens": [7, 10, 18], "dimension": 9, "direct": 6, "directli": [14, 18], "directori": [2, 13], "disabl": [1, 13, 18], "disable_crop_orient": 18, "disable_page_orient": 18, "disclaim": 18, "discuss": 2, "disparag": 1, "displai": [7, 10], "display_artefact": 10, "distribut": 9, "div": 18, "divers": 1, "divid": 7, "do": [2, 3, 8], "doc": [2, 7, 15, 17, 18], "docartefact": [6, 16], "docstr": 2, "doctr": [3, 12, 13, 14, 15, 16, 17, 18], "doctr_cache_dir": 13, "doctr_multiprocessing_dis": 13, "document": [6, 8, 10, 11, 12, 15, 16, 17, 18], "documentbuild": 18, "documentfil": [7, 12, 14, 15, 17], "doesn": 17, "don": [12, 18], "done": 9, "download": [6, 16], "downsiz": 8, "draw": 9, "drop": 6, "drop_last": 6, "dtype": [7, 8, 9, 10, 17], "dual": [4, 6], "dummi": 14, "dummy_img": 18, "dummy_input": 17, "dure": 1, "dutch": 6, "dynam": [6, 15], "dynamic_seq_length": 6, "e": [1, 2, 3, 7, 8], "each": [4, 6, 7, 8, 9, 10, 16, 18], "eas": 2, "easi": [4, 10, 14, 17], "easili": [7, 10, 12, 14, 16, 18], "econom": 1, "edit": 1, "educ": 1, "effect": 18, "effici": [2, 4, 6, 8], "either": [10, 18], "element": [6, 7, 8, 18], "els": [2, 15], "email": 1, "empathi": 1, "en": 18, "enabl": [6, 7], "enclos": 7, "encod": [4, 6, 7, 8, 18], "encode_sequ": 6, "encount": 2, "encrypt": 7, "end": [4, 6, 8, 10], "english": [6, 16], "enough": [2, 18], "ensur": 2, "entri": 6, "environ": [1, 13], "eo": 6, "equiv": 18, "estim": 8, "etc": [7, 15], "ethnic": 1, "evalu": [16, 18], "event": 1, "everyon": 1, "everyth": [2, 18], "exact": [10, 18], "exampl": [1, 2, 4, 6, 8, 14, 18], "exchang": 17, "execut": 18, "exist": 14, "expand": 9, "expect": [7, 9, 10], "experi": 1, "explan": [1, 18], "explicit": 1, "exploit": [4, 8], "export": [7, 8, 10, 11, 15, 18], "export_as_straight_box": [8, 18], "export_as_xml": 18, "export_model_to_onnx": 17, "express": [1, 9], "extens": 7, "extern": [1, 16], "extract": [4, 6], "extractor": 8, "f_": 10, "f_a": 10, "factor": 9, "fair": 1, "fairli": 1, "fals": [6, 7, 8, 9, 10, 12, 18], "faq": 1, "fascan": 14, "fast": [4, 6, 8], "fast_bas": [8, 18], "fast_smal": [8, 18], "fast_tini": [8, 18], "faster": [4, 8, 17], "fasterrcnn_mobilenet_v3_large_fpn": 8, "favorit": 18, "featur": [3, 8, 10, 11, 12, 15], "feedback": 1, "feel": [2, 14], "felix92": 14, "few": [17, 18], "figsiz": 10, "figur": [10, 15], "file": [2, 6], "final": 8, "find": [2, 16], "finnish": 6, "first": [2, 6], "firsthand": 6, "fit": [8, 18], "flag": 18, "flip": 9, "float": [7, 9, 10, 17], "float32": [7, 8, 9, 17], "fn": 9, "focu": 14, "focus": [1, 6], "folder": 6, "follow": [1, 2, 3, 6, 9, 10, 12, 13, 14, 15, 18], "font": 6, "font_famili": 6, "foral": 10, "forc": 2, "forg": 3, "form": [4, 6, 18], "format": [7, 10, 12, 16, 17, 18], "forpost": [4, 6], "forum": 2, "fp16": 17, "frac": 10, "framework": [3, 14, 16, 18], "free": [1, 2, 14], "french": [6, 12, 14, 18], "friendli": 4, "from": [1, 4, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18], "from_hub": [8, 14], "from_imag": [7, 14, 15, 17], "from_pdf": 7, "from_url": 7, "full": [6, 10, 18], "function": [6, 9, 10, 15], "funsd": [4, 6, 16, 18], "further": 16, "futur": 6, "g": [7, 8], "g_": 10, "g_x": 10, "gamma": 9, "gaussian": 9, "gaussianblur": 9, "gaussiannois": 9, "gen": 18, "gender": 1, "gener": [2, 4, 7, 8], "generic_cyrillic_lett": 6, "geometri": [4, 7, 18], "geq": 10, "german": [6, 12, 14], "get": [17, 18], "git": 14, "github": [2, 3, 8, 14], "give": [1, 15], "given": [6, 7, 9, 10, 18], "global": 8, "go": 18, "good": 17, "googl": 2, "googlevis": 4, "gpu": [4, 15, 17], "gracefulli": 1, "graph": [4, 6, 7], "grayscal": 9, "ground": 10, "groung": 10, "group": [4, 18], "gt": 10, "gt_box": 10, "gt_label": 10, "guid": 2, "guidanc": 16, "gvision": 18, "h": [7, 8, 9], "h_": 10, "ha": [2, 6, 10, 16], "handl": [11, 16, 18], "handwrit": 6, "handwritten": 16, "harass": 1, "hardwar": 18, "harm": 1, "hat": 10, "have": [1, 2, 10, 12, 14, 16, 17, 18], "head": [8, 18], "healthi": 1, "hebrew": 6, "height": [7, 9], "hello": [10, 18], "help": 17, "here": [5, 9, 11, 15, 16, 18], "hf": 8, "hf_hub_download": 8, "high": 7, "higher": [3, 6, 18], "hindi": 6, "hindi_digit": 6, "hocr": 18, "hook": 18, "horizont": [7, 9, 18], "hous": 6, "how": [2, 11, 12, 14, 16], "howev": 16, "hsv": 9, "html": [1, 2, 3, 7, 18], "http": [1, 3, 6, 7, 8, 14, 18], "hub": 8, "hue": 9, "huggingfac": 8, "hw": 6, "i": [1, 2, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17], "i7": 18, "ic03": [4, 6, 16], "ic13": [4, 6, 16], "icdar": [4, 6], "icdar2019": 6, "id": 18, "ident": 1, "identifi": 4, "iiit": [4, 6], "iiit5k": [6, 16], "iiithw": [4, 6, 16], "imag": [4, 6, 7, 8, 9, 10, 14, 15, 16, 18], "imagenet": 8, "imageri": 1, "images_90k_norm": 6, "img": [6, 9, 16, 17], "img_cont": 7, "img_fold": [6, 16], "img_path": 7, "img_transform": 6, "imgur5k": [4, 6, 16], "imgur5k_annot": 6, "imlist": 6, "impact": 1, "implement": [6, 7, 8, 9, 10, 18], "import": [6, 7, 8, 9, 10, 12, 14, 15, 16, 17, 18], "improv": 8, "inappropri": 1, "incid": 1, "includ": [1, 6, 16, 17], "inclus": 1, "increas": 9, "independ": 9, "index": [2, 7], "indic": 10, "individu": 1, "infer": [4, 8, 9, 15, 18], "inform": [1, 2, 4, 6, 16], "input": [2, 7, 8, 9, 17, 18], "input_crop": 8, "input_pag": [8, 10, 18], "input_shap": 17, "input_tensor": 8, "inspir": [1, 9], "instal": [14, 15, 17], "instanc": [1, 18], "instanti": [8, 18], "instead": [6, 7, 8], "insult": 1, "int": [6, 7, 9], "int64": 10, "integ": 10, "integr": [4, 14, 16], "intel": 18, "interact": [1, 7, 10], "interfac": [14, 17], "interoper": 17, "interpol": 9, "interpret": [6, 7], "intersect": 10, "invert": 9, "investig": 1, "invis": 1, "involv": [1, 18], "io": [12, 14, 15, 17], "iou": 10, "iou_thresh": 10, "iou_threshold": 15, "irregular": [4, 8, 16], "isn": 6, "issu": [1, 2, 14], "italian": 6, "iter": [6, 9, 16, 18], "its": [7, 8, 9, 10, 16, 18], "itself": [8, 14], "j": 10, "job": 2, "join": 2, "jpeg": 9, "jpegqual": 9, "jpg": [6, 7, 14, 17], "json": [6, 16, 18], "json_output": 18, "jump": 2, "just": 1, "kei": [4, 6], "kera": [8, 17], "kernel": [4, 8, 9], "kernel_shap": 9, "keywoard": 8, "keyword": [6, 7, 8, 10], "kie": [8, 12], "kie_predictor": [8, 12], "kiepredictor": 8, "kind": 1, "know": [2, 17], "kwarg": [6, 7, 8, 10], "l": 10, "l_j": 10, "label": [6, 10, 15, 16], "label_fil": [6, 16], "label_fold": 6, "label_path": [6, 16], "labels_path": [6, 16], "ladder": 1, "lambda": 9, "lambdatransform": 9, "lang": 18, "languag": [1, 4, 6, 7, 8, 14, 18], "larg": [8, 14], "largest": 10, "last": [3, 6], "latenc": 8, "later": 2, "latest": 18, "latin": 6, "layer": 17, "layout": 18, "lead": 1, "leader": 1, "learn": [1, 4, 8, 17, 18], "least": 3, "left": [10, 18], "legacy_french": 6, "length": [6, 18], "less": [17, 18], "level": [1, 6, 10, 18], "leverag": 11, "lf": 14, "librari": [2, 3, 11, 12], "light": 4, "lightweight": 17, "like": 1, "limits_": 10, "line": [4, 8, 10, 18], "line_1_1": 18, "link": 12, "linknet": [4, 8], "linknet_resnet18": [8, 12, 17, 18], "linknet_resnet34": [8, 17, 18], "linknet_resnet50": [8, 18], "list": [6, 7, 9, 10, 14], "ll": 10, "load": [4, 6, 8, 15, 17], "load_state_dict": 12, "load_weight": 12, "loc_pr": 18, "local": [2, 4, 6, 8, 10, 16, 18], "localis": 6, "localizationconfus": 10, "locat": [2, 7, 18], "login": 8, "login_to_hub": [8, 14], "logo": [7, 15, 16], "love": 14, "lower": [9, 10, 18], "m": [2, 10, 18], "m1": 3, "macbook": 3, "machin": 17, "made": 4, "magc_resnet31": 8, "mai": [1, 2], "mail": 1, "main": 11, "maintain": 4, "mainten": 2, "make": [1, 2, 10, 12, 13, 14, 17, 18], "mani": [16, 18], "manipul": 18, "map": [6, 8], "map_loc": 12, "master": [4, 8, 18], "match": [10, 18], "mathcal": 10, "matplotlib": [7, 10], "max": [6, 9, 10], "max_angl": 9, "max_area": 9, "max_char": [6, 16], "max_delta": 9, "max_gain": 9, "max_gamma": 9, "max_qual": 9, "max_ratio": 9, "maximum": [6, 9], "maxval": [8, 9], "mbox": 10, "mean": [9, 10, 12], "meaniou": 10, "meant": [7, 17], "measur": 18, "media": 1, "median": 8, "meet": 12, "member": 1, "memori": [13, 17], "mention": 18, "merg": 6, "messag": 2, "meta": 18, "metadata": 17, "metal": 3, "method": [7, 9, 18], "metric": [10, 18], "middl": 18, "might": [17, 18], "min": 9, "min_area": 9, "min_char": [6, 16], "min_gain": 9, "min_gamma": 9, "min_qual": 9, "min_ratio": 9, "min_val": 9, "minde": [1, 3, 4, 8], "minim": [2, 4], "minimalist": [4, 8], "minimum": [3, 6, 9, 10, 18], "minval": 9, "miss": 3, "mistak": 1, "mixed_float16": 17, "mixed_precis": 17, "mjsynth": [4, 6, 16], "mnt": 6, "mobilenet": [8, 14], "mobilenet_v3_larg": 8, "mobilenet_v3_large_r": 8, "mobilenet_v3_smal": [8, 12], "mobilenet_v3_small_crop_orient": [8, 12], "mobilenet_v3_small_page_orient": [8, 12], "mobilenet_v3_small_r": 8, "mobilenetv3": 8, "modal": [4, 6], "mode": 3, "model": [6, 10, 13, 15, 16], "model_nam": [8, 14, 17], "model_path": [15, 17], "moder": 1, "modif": 2, "modifi": [8, 13, 18], "modul": [3, 7, 8, 9, 10, 18], "more": [2, 16, 18], "most": 18, "mozilla": 1, "multi": [4, 8], "multilingu": [6, 14], "multipl": [6, 7, 9, 18], "multipli": 9, "multiprocess": 13, "my": 8, "my_awesome_model": 14, "my_hook": 18, "n": [6, 10], "name": [6, 8, 17, 18], "nation": 1, "natur": [1, 4, 6], "ndarrai": [6, 7, 9, 10], "necessari": [3, 12, 13], "need": [2, 3, 6, 10, 12, 13, 14, 15, 18], "neg": 9, "nest": 18, "network": [4, 6, 8, 17], "neural": [4, 6, 8, 17], "new": [2, 10], "next": [6, 16], "nois": 9, "noisi": [4, 6], "non": [4, 6, 7, 8, 9, 10], "none": [6, 7, 8, 9, 10, 18], "normal": [8, 9], "norwegian": 6, "note": [0, 2, 6, 8, 12, 14, 15, 17], "now": 2, "np": [8, 9, 10, 18], "num_output_channel": 9, "num_sampl": [6, 16], "number": [6, 9, 10, 18], "numpi": [7, 8, 10, 18], "o": 3, "obb": 15, "obj_detect": 14, "object": [6, 7, 10, 15, 18], "objectness_scor": [7, 18], "oblig": 1, "obtain": 18, "occupi": 17, "ocr": [4, 6, 8, 10, 14], "ocr_carea": 18, "ocr_db_crnn": 10, "ocr_lin": 18, "ocr_pag": 18, "ocr_par": 18, "ocr_predictor": [8, 12, 14, 17, 18], "ocrdataset": [6, 16], "ocrmetr": 10, "ocrpredictor": [8, 12], "ocrx_word": 18, "offens": 1, "offici": [1, 8], "offlin": 1, "offset": 9, "onc": 18, "one": [2, 6, 8, 9, 12, 14, 18], "oneof": 9, "ones": [6, 10], "onli": [2, 8, 9, 10, 12, 14, 16, 17, 18], "onlin": 1, "onnx": 15, "onnxruntim": [15, 17], "onnxtr": 17, "opac": 9, "opacity_rang": 9, "open": [1, 2, 14, 17], "opinion": 1, "optic": [4, 18], "optim": [4, 18], "option": [6, 8, 12], "order": [2, 6, 7, 9], "org": [1, 6, 8, 18], "organ": 7, "orient": [1, 7, 8, 11, 15, 18], "orientationpredictor": 8, "other": [1, 2], "otherwis": [1, 7, 10], "our": [2, 8, 18], "out": [2, 8, 9, 10, 18], "outpout": 18, "output": [7, 9, 17], "output_s": [7, 9], "outsid": 13, "over": [6, 10, 18], "overal": [1, 8], "overlai": 7, "overview": 15, "overwrit": 12, "overwritten": 14, "own": 4, "p": [9, 18], "packag": [2, 4, 10, 13, 15, 16, 17], "pad": [6, 8, 9, 18], "page": [3, 6, 8, 10, 12, 18], "page1": 7, "page2": 7, "page_1": 18, "page_idx": [7, 18], "page_orientation_predictor": [8, 12], "page_param": 12, "pair": 10, "paper": 8, "par_1_1": 18, "paragraph": 18, "paragraph_break": 18, "param": [9, 18], "paramet": [4, 7, 8, 17], "pars": [4, 6], "parseq": [4, 8, 14, 17, 18], "part": [6, 9, 18], "parti": 3, "partial": 18, "particip": 1, "pass": [6, 7, 8, 12, 18], "password": 7, "patch": [8, 10], "path": [6, 7, 15, 16, 17], "path_to_checkpoint": 12, "path_to_custom_model": 17, "path_to_pt": 12, "pattern": 1, "pdf": [7, 8, 11], "pdfpage": 7, "peopl": 1, "per": [9, 18], "perform": [4, 7, 8, 9, 10, 13, 17, 18], "period": 1, "permiss": 1, "permut": [4, 8], "persian_lett": 6, "person": [1, 16], "phase": 18, "photo": 16, "physic": [1, 7], "pick": 9, "pictur": 7, "pip": [2, 3, 15, 17], "pipelin": 18, "pixel": [7, 9, 18], "pleas": 2, "plot": 10, "plt": 10, "plug": 14, "plugin": 3, "png": 7, "point": 17, "polici": 13, "polish": 6, "polit": 1, "polygon": [6, 10, 18], "pool": 8, "portugues": 6, "posit": [1, 10], "possibl": [2, 10, 14, 18], "post": [1, 18], "postprocessor": 18, "potenti": 8, "power": 4, "ppageno": 18, "pre": [2, 8, 17], "precis": [10, 18], "pred": 10, "pred_box": 10, "pred_label": 10, "predefin": 16, "predict": [7, 8, 10, 18], "predictor": [4, 7, 8, 11, 12, 14, 17], "prefer": 16, "preinstal": 3, "preprocessor": [12, 18], "prerequisit": 14, "present": 11, "preserv": [8, 9, 18], "preserve_aspect_ratio": [7, 8, 9, 12, 18], "pretrain": [4, 8, 10, 12, 17, 18], "pretrained_backbon": [8, 12], "print": 18, "prior": 6, "privaci": 1, "privat": 1, "probabl": 9, "problem": 2, "procedur": 9, "process": [2, 4, 7, 12, 18], "processor": 18, "produc": [11, 18], "product": 17, "profession": 1, "project": [2, 16], "promptli": 1, "proper": 2, "properli": 6, "provid": [1, 2, 4, 14, 15, 16, 18], "public": [1, 4], "publicli": 18, "publish": 1, "pull": 14, "punctuat": 6, "pure": 6, "purpos": 2, "push_to_hf_hub": [8, 14], "py": 14, "pypdfium2": [3, 7], "pyplot": [7, 10], "python": [2, 15], "python3": 14, "pytorch": [3, 4, 8, 9, 12, 14, 17, 18], "q": 2, "qr": [7, 15], "qr_code": 16, "qualiti": 9, "question": 1, "quickli": 4, "quicktour": 11, "r": 18, "race": 1, "ramdisk": 6, "rand": [8, 9, 10, 17, 18], "random": [8, 9, 10, 18], "randomappli": 9, "randombright": 9, "randomcontrast": 9, "randomcrop": 9, "randomgamma": 9, "randomhorizontalflip": 9, "randomhu": 9, "randomjpegqu": 9, "randomli": 9, "randomres": 9, "randomrot": 9, "randomsatur": 9, "randomshadow": 9, "rang": 9, "rassi": 14, "ratio": [8, 9, 18], "raw": [7, 10], "re": 17, "read": [4, 6, 8], "read_html": 7, "read_img_as_numpi": 7, "read_img_as_tensor": 7, "read_pdf": 7, "readi": 17, "real": [4, 8, 9], "reason": [1, 4, 6], "rebuild": 2, "rebuilt": 2, "recal": [10, 18], "receipt": [4, 6, 18], "reco_arch": [8, 12, 14, 17], "reco_b": 18, "reco_model": [12, 14, 17], "reco_param": 12, "reco_predictor": 12, "recogn": 18, "recognit": [6, 10, 11, 12], "recognition_predictor": [8, 18], "recognition_task": [6, 16], "recognitiondataset": [6, 16], "recognitionpredictor": [8, 12], "rectangular": 8, "reduc": [3, 9], "refer": [2, 3, 12, 14, 15, 16, 18], "regardless": 1, "region": 18, "regroup": 10, "regular": 16, "reject": 1, "rel": [7, 9, 10, 18], "relat": 7, "releas": [0, 3], "relev": 15, "religion": 1, "remov": 1, "render": [7, 18], "repo": 8, "repo_id": [8, 14], "report": 1, "repositori": [6, 8, 14], "repres": [1, 17, 18], "represent": [4, 8], "request": [1, 14], "requir": [3, 9, 17], "research": 4, "residu": 8, "resiz": [9, 18], "resnet": 8, "resnet18": [8, 14], "resnet31": 8, "resnet34": 8, "resnet50": [8, 14], "resolv": 7, "resolve_block": 18, "resolve_lin": 18, "resourc": 16, "respect": 1, "rest": [2, 9, 10], "restrict": 13, "result": [2, 6, 7, 11, 14, 17, 18], "return": 18, "reusabl": 18, "review": 1, "rgb": [7, 9], "rgb_mode": 7, "rgb_output": 7, "right": [1, 8, 10], "robust": [4, 6], "root": 6, "rotat": [6, 7, 8, 9, 10, 11, 12, 16, 18], "run": [2, 3, 8], "same": [2, 7, 10, 16, 17, 18], "sampl": [6, 16, 18], "sample_transform": 6, "sar": [4, 8], "sar_resnet31": [8, 18], "satur": 9, "save": [8, 16], "scale": [7, 8, 9, 10], "scale_rang": 9, "scan": [4, 6], "scene": [4, 6, 8], "score": [7, 10], "script": [2, 16], "seamless": 4, "seamlessli": [4, 18], "search": 8, "searchabl": 11, "sec": 18, "second": 18, "section": [12, 14, 15, 17, 18], "secur": [1, 13], "see": [1, 2], "seen": 18, "segment": [4, 8, 18], "self": 18, "semant": [4, 8], "send": 18, "sens": 10, "sensit": 16, "separ": 18, "sequenc": [4, 6, 7, 8, 10, 18], "sequenti": [9, 18], "seri": 1, "seriou": 1, "set": [1, 3, 6, 8, 10, 13, 15, 18], "set_global_polici": 17, "sever": [7, 9, 18], "sex": 1, "sexual": 1, "shade": 9, "shape": [4, 7, 8, 9, 10, 18], "share": [13, 16], "shift": 9, "shm": 13, "should": [2, 6, 7, 9, 10], "show": [4, 7, 8, 10, 12, 14, 15], "showcas": [2, 11], "shuffl": [6, 9], "side": 10, "signatur": 7, "signific": 16, "simpl": [4, 8, 17], "simpler": 8, "sinc": [6, 16], "singl": [1, 2, 4, 6], "single_img_doc": 17, "size": [1, 6, 7, 9, 15, 18], "skew": 18, "slack": 2, "slightli": 8, "small": [2, 8, 18], "smallest": 7, "snapshot_download": 8, "snippet": 18, "so": [2, 3, 6, 8, 14, 16], "social": 1, "socio": 1, "some": [3, 11, 14, 16], "someth": 2, "somewher": 2, "sort": 1, "sourc": [6, 7, 8, 9, 10, 14], "space": [1, 18], "span": 18, "spanish": 6, "spatial": [4, 6, 7], "specif": [2, 3, 10, 12, 16, 18], "specifi": [1, 6, 7], "speed": [4, 8, 18], "sphinx": 2, "sroie": [4, 6, 16], "stabl": 3, "stackoverflow": 2, "stage": 4, "standalon": 11, "standard": 9, "start": 6, "state": [4, 10, 15], "static": 10, "statu": 1, "std": [9, 12], "step": 13, "still": 18, "str": [6, 7, 8, 9, 10], "straight": [6, 8, 16, 18], "straighten": 18, "straighten_pag": [8, 12, 18], "straigten_pag": 12, "stream": 7, "street": [4, 6], "strict": 3, "strictli": 10, "string": [6, 7, 10, 18], "strive": 3, "strong": [4, 8], "structur": [17, 18], "subset": [6, 18], "suggest": [2, 14], "sum": 10, "summari": 10, "support": [3, 12, 15, 17, 18], "sustain": 1, "svhn": [4, 6, 16], "svt": [6, 16], "swedish": 6, "symmetr": [8, 9, 18], "symmetric_pad": [8, 9, 18], "synthet": 4, "synthtext": [4, 6, 16], "system": 18, "t": [2, 6, 12, 17, 18], "tabl": [14, 15, 16], "take": [1, 6, 18], "target": [6, 7, 9, 10, 16], "target_s": 6, "task": [4, 6, 8, 14, 16, 18], "task2": 6, "team": 3, "techminde": 3, "templat": [2, 4], "tensor": [6, 7, 9, 18], "tensorflow": [3, 4, 7, 8, 9, 12, 14, 17, 18], "tensorspec": 17, "term": 1, "test": [6, 16], "test_set": 6, "text": [6, 7, 8, 10, 16], "text_output": 18, "textmatch": 10, "textnet": 8, "textnet_bas": 8, "textnet_smal": 8, "textnet_tini": 8, "textract": [4, 18], "textstylebrush": [4, 6], "textual": [4, 6, 7, 8, 18], "tf": [3, 7, 8, 9, 14, 17], "than": [2, 10, 14], "thank": 2, "thei": [1, 10], "them": [6, 18], "thi": [1, 2, 3, 5, 6, 9, 10, 12, 13, 14, 16, 17, 18], "thing": [17, 18], "third": 3, "those": [1, 7, 18], "threaten": 1, "threshold": 18, "through": [1, 9, 15, 16], "tilman": 14, "time": [1, 4, 8, 10, 16], "tini": 8, "titl": [7, 18], "tm": 18, "tmp": 13, "togeth": [2, 7], "tograi": 9, "tool": 16, "top": [10, 17, 18], "topic": 2, "torch": [3, 9, 12, 14, 17], "torchvis": 9, "total": 12, "toward": [1, 3], "train": [2, 6, 8, 9, 14, 15, 16, 17, 18], "train_it": [6, 16], "train_load": [6, 16], "train_pytorch": 14, "train_set": [6, 16], "train_tensorflow": 14, "trainabl": [4, 8], "tranform": 9, "transcrib": 18, "transfer": [4, 6], "transfo": 9, "transform": [4, 6, 8], "translat": 1, "troll": 1, "true": [6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18], "truth": 10, "tune": 17, "tupl": [6, 7, 9, 10], "two": [7, 13], "txt": 6, "type": [7, 10, 14, 17, 18], "typic": 18, "u": [1, 2], "ucsd": 6, "udac": 2, "uint8": [7, 8, 10, 18], "ukrainian": 6, "unaccept": 1, "underli": [16, 18], "underneath": 7, "understand": [4, 6, 18], "uniform": [8, 9], "uniformli": 9, "uninterrupt": [7, 18], "union": 10, "unittest": 2, "unlock": 7, "unoffici": 8, "unprofession": 1, "unsolicit": 1, "unsupervis": 4, "unwelcom": 1, "up": [8, 18], "updat": 10, "upgrad": 2, "upper": [6, 9], "uppercas": 16, "url": 7, "us": [1, 2, 3, 6, 8, 10, 11, 12, 13, 14, 15, 18], "usabl": 18, "usag": [13, 17], "use_polygon": [6, 10, 16], "useabl": 18, "user": [4, 7, 11], "utf": 18, "util": 17, "v1": 14, "v3": [8, 14, 18], "valid": 16, "valu": [2, 7, 9, 18], "valuabl": 4, "variabl": 13, "varieti": 6, "veri": 8, "version": [1, 2, 3, 17, 18], "vgg": 8, "vgg16": 14, "vgg16_bn_r": 8, "via": 1, "vietnames": 6, "view": [4, 6], "viewpoint": 1, "violat": 1, "visibl": 1, "vision": [4, 6, 8], "visiondataset": 6, "visiontransform": 8, "visual": [3, 4, 15], "visualize_pag": 10, "vit_": 8, "vit_b": 8, "vitstr": [4, 8, 17], "vitstr_bas": [8, 18], "vitstr_smal": [8, 12, 17, 18], "viz": 3, "vocab": [12, 14, 16, 17, 18], "vocabulari": [6, 12, 14], "w": [7, 8, 9, 10], "w3": 18, "wa": 1, "wai": [1, 4, 16], "want": [2, 17, 18], "warmup": 18, "wasn": 2, "we": [1, 2, 3, 4, 7, 9, 12, 14, 16, 17, 18], "weasyprint": 7, "web": [2, 7], "websit": 6, "welcom": 1, "well": [1, 17], "were": [1, 7, 18], "what": 1, "when": [1, 2, 8], "whenev": 2, "where": [2, 7, 9, 10], "whether": [2, 6, 7, 9, 10, 16, 18], "which": [1, 8, 13, 15, 16, 18], "whichev": 3, "while": [9, 18], "why": 1, "width": [7, 9], "wiki": 1, "wildreceipt": [4, 6, 16], "window": [8, 10], "wish": 2, "within": 1, "without": [1, 6, 8], "wonder": 2, "word": [4, 6, 8, 10, 18], "word_1_1": 18, "word_1_2": 18, "word_1_3": 18, "wordgener": [6, 16], "words_onli": 10, "work": [12, 13, 18], "workflow": 2, "worklow": 2, "world": [10, 18], "worth": 8, "wrap": 18, "wrapper": [6, 9], "write": 13, "written": [1, 7], "www": [1, 7, 18], "x": [7, 9, 10], "x_ascend": 18, "x_descend": 18, "x_i": 10, "x_size": 18, "x_wconf": 18, "xhtml": 18, "xmax": 7, "xmin": 7, "xml": 18, "xml_bytes_str": 18, "xml_element": 18, "xml_output": 18, "xmln": 18, "y": 10, "y_i": 10, "y_j": 10, "yet": 15, "ymax": 7, "ymin": 7, "yolov8": 15, "you": [2, 3, 6, 7, 8, 12, 13, 14, 15, 16, 17, 18], "your": [2, 4, 7, 10, 18], "yoursit": 7, "zero": [9, 10], "zoo": 12, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 6, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 6, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 6, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 6, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 6, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 6, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 6, "\u00e4\u00f6\u00e4\u00f6": 6, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 6, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 6, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 6, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 6, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 6, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 6, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0905": 6, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 6, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 6, "\u0950": 6, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 6, "\u09bd": 6, "\u09ce": 6, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 6}, "titles": ["Changelog", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 2, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 1], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 1], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 1], "31": 0, "4": [0, 1], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 18, "approach": 18, "architectur": 18, "arg": [6, 7, 8, 9, 10], "artefact": 7, "artefactdetect": 15, "attribut": 1, "avail": [15, 16, 18], "aw": 13, "ban": 1, "block": 7, "bug": 2, "changelog": 0, "choos": [16, 18], "classif": [8, 12, 14], "code": [1, 2], "codebas": 2, "commit": 2, "commun": 14, "compos": 9, "conda": 3, "conduct": 1, "connect": 2, "continu": 2, "contrib": 5, "contribut": [2, 5, 15], "contributor": 1, "convent": 14, "correct": 1, "coven": 1, "custom": [6, 12], "data": 16, "dataload": 6, "dataset": [4, 6, 16], "detect": [4, 8, 14, 16, 18], "develop": 2, "do": 18, "doctr": [2, 4, 5, 6, 7, 8, 9, 10, 11], "document": [2, 4, 7], "end": 18, "enforc": 1, "evalu": 10, "export": 17, "factori": 8, "featur": [2, 4], "feedback": 2, "file": 7, "from": 14, "gener": [6, 16], "git": 3, "guidelin": 1, "half": 17, "hub": 14, "huggingfac": 14, "i": 18, "infer": 17, "instal": [2, 3], "integr": [2, 15], "io": 7, "lambda": 13, "let": 2, "line": 7, "linux": 3, "load": [12, 14, 16], "loader": 6, "main": 4, "mode": 2, "model": [4, 8, 12, 14, 17, 18], "modifi": 2, "modul": [5, 15], "name": 14, "notebook": 11, "object": 16, "ocr": [16, 18], "onli": 3, "onnx": 17, "optim": 17, "option": 18, "orient": 12, "our": 1, "output": 18, "own": [12, 16], "packag": 3, "page": 7, "perman": 1, "pipelin": 15, "pledg": 1, "precis": 17, "predictor": 18, "prepar": 17, "prerequisit": 3, "pretrain": 14, "push": 14, "python": 3, "qualiti": 2, "question": 2, "read": 7, "readi": 16, "recognit": [4, 8, 14, 16, 18], "report": 2, "request": 2, "respons": 1, "return": [6, 7, 8, 10], "right": 18, "scope": 1, "share": 14, "should": 18, "stage": 18, "standard": 1, "structur": [2, 7], "style": 2, "support": [4, 5, 6, 9], "synthet": [6, 16], "task": 10, "temporari": 1, "test": 2, "text": [4, 18], "train": 12, "transform": 9, "two": 18, "unit": 2, "us": [16, 17], "util": 10, "v0": 0, "verif": 2, "via": 3, "visual": 10, "vocab": 6, "warn": 1, "what": 18, "word": 7, "your": [12, 14, 15, 16, 17], "zoo": [4, 8]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[1, "correction"]], "2. Warning": [[1, "warning"]], "3. Temporary Ban": [[1, "temporary-ban"]], "4. Permanent Ban": [[1, "permanent-ban"]], "AWS Lambda": [[13, null]], "Advanced options": [[18, "advanced-options"]], "Args:": [[6, "args"], [6, "id4"], [6, "id7"], [6, "id10"], [6, "id13"], [6, "id16"], [6, "id19"], [6, "id22"], [6, "id25"], [6, "id29"], [6, "id32"], [6, "id37"], [6, "id40"], [6, "id46"], [6, "id49"], [6, "id50"], [6, "id51"], [6, "id54"], [6, "id57"], [6, "id60"], [6, "id61"], [7, "args"], [7, "id2"], [7, "id3"], [7, "id4"], [7, "id5"], [7, "id6"], [7, "id7"], [7, "id10"], [7, "id12"], [7, "id14"], [7, "id16"], [7, "id20"], [7, "id24"], [7, "id28"], [8, "args"], [8, "id3"], [8, "id8"], [8, "id13"], [8, "id17"], [8, "id21"], [8, "id26"], [8, "id31"], [8, "id36"], [8, "id41"], [8, "id46"], [8, "id50"], [8, "id54"], [8, "id59"], [8, "id63"], [8, "id68"], [8, "id73"], [8, "id77"], [8, "id81"], [8, "id85"], [8, "id90"], [8, "id95"], [8, "id99"], [8, "id104"], [8, "id109"], [8, "id114"], [8, "id119"], [8, "id123"], [8, "id127"], [8, "id132"], [8, "id137"], [8, "id142"], [8, "id146"], [8, "id150"], [8, "id155"], [8, "id159"], [8, "id163"], [8, "id167"], [8, "id169"], [8, "id171"], [8, "id173"], [9, "args"], [9, "id1"], [9, "id2"], [9, "id3"], [9, "id4"], [9, "id5"], [9, "id6"], [9, "id7"], [9, "id8"], [9, "id9"], [9, "id10"], [9, "id11"], [9, "id12"], [9, "id13"], [9, "id14"], [9, "id15"], [9, "id16"], [9, "id17"], [9, "id18"], [9, "id19"], [10, "args"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"]], "Artefact": [[7, "artefact"]], "ArtefactDetection": [[15, "artefactdetection"]], "Attribution": [[1, "attribution"]], "Available Datasets": [[16, "available-datasets"]], "Available architectures": [[18, "available-architectures"], [18, "id1"], [18, "id2"]], "Available contribution modules": [[15, "available-contribution-modules"]], "Block": [[7, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[16, null]], "Choosing the right model": [[18, null]], "Classification": [[14, "classification"]], "Code quality": [[2, "code-quality"]], "Code style verification": [[2, "code-style-verification"]], "Codebase structure": [[2, "codebase-structure"]], "Commits": [[2, "commits"]], "Composing transformations": [[9, "composing-transformations"]], "Continuous Integration": [[2, "continuous-integration"]], "Contributing to docTR": [[2, null]], "Contributor Covenant Code of Conduct": [[1, null]], "Custom dataset loader": [[6, "custom-dataset-loader"]], "Custom orientation classification models": [[12, "custom-orientation-classification-models"]], "Data Loading": [[16, "data-loading"]], "Dataloader": [[6, "dataloader"]], "Detection": [[14, "detection"], [16, "detection"]], "Detection predictors": [[18, "detection-predictors"]], "Developer mode installation": [[2, "developer-mode-installation"]], "Developing docTR": [[2, "developing-doctr"]], "Document": [[7, "document"]], "Document structure": [[7, "document-structure"]], "End-to-End OCR": [[18, "end-to-end-ocr"]], "Enforcement": [[1, "enforcement"]], "Enforcement Guidelines": [[1, "enforcement-guidelines"]], "Enforcement Responsibilities": [[1, "enforcement-responsibilities"]], "Export to ONNX": [[17, "export-to-onnx"]], "Feature requests & bug report": [[2, "feature-requests-bug-report"]], "Feedback": [[2, "feedback"]], "File reading": [[7, "file-reading"]], "Half-precision": [[17, "half-precision"]], "Installation": [[3, null]], "Integrate contributions into your pipeline": [[15, null]], "Let\u2019s connect": [[2, "let-s-connect"]], "Line": [[7, "line"]], "Loading from Huggingface Hub": [[14, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[12, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[12, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[4, "main-features"]], "Model optimization": [[17, "model-optimization"]], "Model zoo": [[4, "model-zoo"]], "Modifying the documentation": [[2, "modifying-the-documentation"]], "Naming conventions": [[14, "naming-conventions"]], "OCR": [[16, "ocr"]], "Object Detection": [[16, "object-detection"]], "Our Pledge": [[1, "our-pledge"]], "Our Standards": [[1, "our-standards"]], "Page": [[7, "page"]], "Preparing your model for inference": [[17, null]], "Prerequisites": [[3, "prerequisites"]], "Pretrained community models": [[14, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[14, "pushing-to-the-huggingface-hub"]], "Questions": [[2, "questions"]], "Recognition": [[14, "recognition"], [16, "recognition"]], "Recognition predictors": [[18, "recognition-predictors"]], "Returns:": [[6, "returns"], [7, "returns"], [7, "id11"], [7, "id13"], [7, "id15"], [7, "id19"], [7, "id23"], [7, "id27"], [7, "id31"], [8, "returns"], [8, "id6"], [8, "id11"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id29"], [8, "id34"], [8, "id39"], [8, "id44"], [8, "id49"], [8, "id53"], [8, "id57"], [8, "id62"], [8, "id66"], [8, "id71"], [8, "id76"], [8, "id80"], [8, "id84"], [8, "id88"], [8, "id93"], [8, "id98"], [8, "id102"], [8, "id107"], [8, "id112"], [8, "id117"], [8, "id122"], [8, "id126"], [8, "id130"], [8, "id135"], [8, "id140"], [8, "id145"], [8, "id149"], [8, "id153"], [8, "id158"], [8, "id162"], [8, "id166"], [8, "id168"], [8, "id170"], [8, "id172"], [10, "returns"]], "Scope": [[1, "scope"]], "Share your model with the community": [[14, null]], "Supported Vocabs": [[6, "supported-vocabs"]], "Supported contribution modules": [[5, "supported-contribution-modules"]], "Supported datasets": [[4, "supported-datasets"]], "Supported transformations": [[9, "supported-transformations"]], "Synthetic dataset generator": [[6, "synthetic-dataset-generator"], [16, "synthetic-dataset-generator"]], "Task evaluation": [[10, "task-evaluation"]], "Text Detection": [[18, "text-detection"]], "Text Recognition": [[18, "text-recognition"]], "Text detection models": [[4, "text-detection-models"]], "Text recognition models": [[4, "text-recognition-models"]], "Train your own model": [[12, null]], "Two-stage approaches": [[18, "two-stage-approaches"]], "Unit tests": [[2, "unit-tests"]], "Use your own datasets": [[16, "use-your-own-datasets"]], "Using your ONNX exported model": [[17, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[3, "via-conda-only-for-linux"]], "Via Git": [[3, "via-git"]], "Via Python Package": [[3, "via-python-package"]], "Visualization": [[10, "visualization"]], "What should I do with the output?": [[18, "what-should-i-do-with-the-output"]], "Word": [[7, "word"]], "docTR Notebooks": [[11, null]], "docTR Vocabs": [[6, "id62"]], "docTR: Document Text Recognition": [[4, null]], "doctr.contrib": [[5, null]], "doctr.datasets": [[6, null], [6, "datasets"]], "doctr.io": [[7, null]], "doctr.models": [[8, null]], "doctr.models.classification": [[8, "doctr-models-classification"]], "doctr.models.detection": [[8, "doctr-models-detection"]], "doctr.models.factory": [[8, "doctr-models-factory"]], "doctr.models.recognition": [[8, "doctr-models-recognition"]], "doctr.models.zoo": [[8, "doctr-models-zoo"]], "doctr.transforms": [[9, null]], "doctr.utils": [[10, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[7, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[7, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[9, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[6, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[9, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[9, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[6, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[6, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[8, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[6, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[6, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[7, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[7, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[6, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[6, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[9, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[9, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[6, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[6, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[6, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[6, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[6, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[8, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[9, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[7, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[6, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[9, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[8, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[6, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[9, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[7, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[9, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[9, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[9, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[9, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[9, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[9, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[9, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[9, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[9, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[9, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[9, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[9, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[7, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[7, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[7, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[6, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[9, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[7, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[7, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[6, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[6, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[6, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[6, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[9, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[10, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[6, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[7, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[6, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[6, 0, 1, "", "CORD"], [6, 0, 1, "", "CharacterGenerator"], [6, 0, 1, "", "DetectionDataset"], [6, 0, 1, "", "DocArtefacts"], [6, 0, 1, "", "FUNSD"], [6, 0, 1, "", "IC03"], [6, 0, 1, "", "IC13"], [6, 0, 1, "", "IIIT5K"], [6, 0, 1, "", "IIITHWS"], [6, 0, 1, "", "IMGUR5K"], [6, 0, 1, "", "MJSynth"], [6, 0, 1, "", "OCRDataset"], [6, 0, 1, "", "RecognitionDataset"], [6, 0, 1, "", "SROIE"], [6, 0, 1, "", "SVHN"], [6, 0, 1, "", "SVT"], [6, 0, 1, "", "SynthText"], [6, 0, 1, "", "WILDRECEIPT"], [6, 0, 1, "", "WordGenerator"], [6, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[6, 0, 1, "", "DataLoader"]], "doctr.io": [[7, 0, 1, "", "Artefact"], [7, 0, 1, "", "Block"], [7, 0, 1, "", "Document"], [7, 0, 1, "", "DocumentFile"], [7, 0, 1, "", "Line"], [7, 0, 1, "", "Page"], [7, 0, 1, "", "Word"], [7, 1, 1, "", "decode_img_as_tensor"], [7, 1, 1, "", "read_html"], [7, 1, 1, "", "read_img_as_numpy"], [7, 1, 1, "", "read_img_as_tensor"], [7, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[7, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[7, 2, 1, "", "from_images"], [7, 2, 1, "", "from_pdf"], [7, 2, 1, "", "from_url"]], "doctr.io.Page": [[7, 2, 1, "", "show"]], "doctr.models": [[8, 1, 1, "", "kie_predictor"], [8, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[8, 1, 1, "", "crop_orientation_predictor"], [8, 1, 1, "", "magc_resnet31"], [8, 1, 1, "", "mobilenet_v3_large"], [8, 1, 1, "", "mobilenet_v3_large_r"], [8, 1, 1, "", "mobilenet_v3_small"], [8, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [8, 1, 1, "", "mobilenet_v3_small_page_orientation"], [8, 1, 1, "", "mobilenet_v3_small_r"], [8, 1, 1, "", "page_orientation_predictor"], [8, 1, 1, "", "resnet18"], [8, 1, 1, "", "resnet31"], [8, 1, 1, "", "resnet34"], [8, 1, 1, "", "resnet50"], [8, 1, 1, "", "textnet_base"], [8, 1, 1, "", "textnet_small"], [8, 1, 1, "", "textnet_tiny"], [8, 1, 1, "", "vgg16_bn_r"], [8, 1, 1, "", "vit_b"], [8, 1, 1, "", "vit_s"]], "doctr.models.detection": [[8, 1, 1, "", "db_mobilenet_v3_large"], [8, 1, 1, "", "db_resnet50"], [8, 1, 1, "", "detection_predictor"], [8, 1, 1, "", "fast_base"], [8, 1, 1, "", "fast_small"], [8, 1, 1, "", "fast_tiny"], [8, 1, 1, "", "linknet_resnet18"], [8, 1, 1, "", "linknet_resnet34"], [8, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[8, 1, 1, "", "from_hub"], [8, 1, 1, "", "login_to_hub"], [8, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[8, 1, 1, "", "crnn_mobilenet_v3_large"], [8, 1, 1, "", "crnn_mobilenet_v3_small"], [8, 1, 1, "", "crnn_vgg16_bn"], [8, 1, 1, "", "master"], [8, 1, 1, "", "parseq"], [8, 1, 1, "", "recognition_predictor"], [8, 1, 1, "", "sar_resnet31"], [8, 1, 1, "", "vitstr_base"], [8, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[9, 0, 1, "", "ChannelShuffle"], [9, 0, 1, "", "ColorInversion"], [9, 0, 1, "", "Compose"], [9, 0, 1, "", "GaussianBlur"], [9, 0, 1, "", "GaussianNoise"], [9, 0, 1, "", "LambdaTransformation"], [9, 0, 1, "", "Normalize"], [9, 0, 1, "", "OneOf"], [9, 0, 1, "", "RandomApply"], [9, 0, 1, "", "RandomBrightness"], [9, 0, 1, "", "RandomContrast"], [9, 0, 1, "", "RandomCrop"], [9, 0, 1, "", "RandomGamma"], [9, 0, 1, "", "RandomHorizontalFlip"], [9, 0, 1, "", "RandomHue"], [9, 0, 1, "", "RandomJpegQuality"], [9, 0, 1, "", "RandomResize"], [9, 0, 1, "", "RandomRotate"], [9, 0, 1, "", "RandomSaturation"], [9, 0, 1, "", "RandomShadow"], [9, 0, 1, "", "Resize"], [9, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[10, 0, 1, "", "DetectionMetric"], [10, 0, 1, "", "LocalizationConfusion"], [10, 0, 1, "", "OCRMetric"], [10, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.visualization": [[10, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [1, 7, 8, 10, 14, 17], "0": [1, 3, 6, 9, 10, 12, 15, 16, 18], "00": 18, "01": 18, "0123456789": 6, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "02562": 8, "03": 18, "035": 18, "0361328125": 18, "04": 18, "05": 18, "06": 18, "06640625": 18, "07": 18, "08": [9, 18], "09": 18, "0966796875": 18, "1": [6, 7, 8, 9, 10, 12, 16, 18], "10": [6, 10, 18], "100": [6, 9, 10, 16, 18], "1000": 18, "101": 6, "1024": [8, 12, 18], "104": 6, "106": 6, "108": 6, "1095": 16, "11": 18, "110": 10, "1107": 16, "114": 6, "115": 6, "1156": 16, "116": 6, "118": 6, "11800h": 18, "11th": 18, "12": 18, "120": 6, "123": 6, "126": 6, "1268": 16, "128": [8, 12, 17, 18], "13": 18, "130": 6, "13068": 16, "131": 6, "1337891": 16, "1357421875": 18, "1396484375": 18, "14": 18, "1420": 18, "14470v1": 6, "149": 16, "15": 18, "150": [10, 18], "1552": 18, "16": [8, 17, 18], "1630859375": 18, "1684": 18, "16x16": 8, "17": 18, "1778": 18, "1782": 18, "18": [8, 18], "185546875": 18, "1900": 18, "1910": 8, "19342": 16, "19370": 16, "195": 6, "19598": 16, "199": 18, "1999": 18, "2": [3, 4, 6, 7, 9, 15, 18], "20": 18, "200": 10, "2000": 16, "2003": [4, 6], "2012": 6, "2013": [4, 6], "2015": 6, "2019": 4, "207901": 16, "21": 18, "2103": 6, "2186": 16, "21888": 16, "22": 18, "224": [8, 9], "225": 9, "22672": 16, "229": [9, 16], "23": 18, "233": 16, "236": 6, "24": 18, "246": 16, "249": 16, "25": 18, "2504": 18, "255": [7, 8, 9, 10, 18], "256": 8, "257": 16, "26": 18, "26032": 16, "264": 12, "27": 18, "2700": 16, "2710": 18, "2749": 12, "28": 18, "287": 12, "29": 18, "296": 12, "299": 12, "2d": 18, "3": [3, 4, 7, 8, 9, 10, 17, 18], "30": 18, "300": 16, "3000": 16, "301": 12, "30595": 18, "30ghz": 18, "31": 8, "32": [6, 8, 9, 12, 16, 17, 18], "3232421875": 18, "33": [9, 18], "33402": 16, "33608": 16, "34": [8, 18], "340": 18, "3456": 18, "3515625": 18, "36": 18, "360": 16, "37": [6, 18], "38": 18, "39": 18, "4": [8, 9, 10, 18], "40": 18, "406": 9, "41": 18, "42": 18, "43": 18, "44": 18, "45": 18, "456": 9, "46": 18, "47": 18, "472": 16, "48": [6, 18], "485": 9, "49": 18, "49377": 16, "5": [6, 9, 10, 15, 18], "50": [8, 16, 18], "51": 18, "51171875": 18, "512": 8, "52": [6, 18], "529": 18, "53": 18, "54": 18, "540": 18, "5478515625": 18, "55": 18, "56": 18, "57": 18, "58": [6, 18], "580": 18, "5810546875": 18, "583": 18, "59": 18, "597": 18, "5k": [4, 6], "5m": 18, "6": [9, 18], "60": 9, "600": [8, 10, 18], "61": 18, "62": 18, "626": 16, "63": 18, "64": [8, 9, 18], "641": 18, "647": 16, "65": 18, "66": 18, "67": 18, "68": 18, "69": 18, "693": 12, "694": 12, "695": 12, "6m": 18, "7": 18, "70": [6, 10, 18], "707470": 16, "71": [6, 18], "7100000": 16, "7141797": 16, "7149": 16, "72": 18, "72dpi": 7, "73": 18, "73257": 16, "74": 18, "75": [9, 18], "7581382": 16, "76": 18, "77": 18, "772": 12, "772875": 16, "78": 18, "785": 12, "79": 18, "793533": 16, "796": 16, "798": 12, "7m": 18, "8": [8, 9, 18], "80": 18, "800": [8, 10, 16, 18], "81": 18, "82": 18, "83": 18, "84": 18, "849": 16, "85": 18, "8564453125": 18, "857": 18, "85875": 16, "86": 18, "8603515625": 18, "87": 18, "8707": 16, "88": 18, "89": 18, "9": [3, 9, 18], "90": 18, "90k": 6, "90kdict32px": 6, "91": 18, "914085328578949": 18, "92": 18, "93": 18, "94": [6, 18], "95": [10, 18], "9578408598899841": 18, "96": 18, "97": 18, "98": 18, "99": 18, "9949972033500671": 18, "A": [1, 2, 4, 6, 7, 8, 11, 17], "As": 2, "Be": 18, "Being": 1, "By": 13, "For": [1, 2, 3, 12, 18], "If": [2, 7, 8, 12, 18], "In": [2, 6, 16], "It": [9, 14, 15, 17], "Its": [4, 8], "No": [1, 18], "Of": 6, "Or": [15, 17], "The": [1, 2, 6, 7, 10, 13, 15, 16, 17, 18], "Then": 8, "To": [2, 3, 13, 14, 15, 17, 18], "_": [1, 6, 8], "__call__": 18, "_build": 2, "_i": 10, "ab": 6, "abc": 17, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "abdef": [6, 16], "abl": [16, 18], "about": [1, 16, 18], "abov": 18, "abstractdataset": 6, "abus": 1, "accept": 1, "access": [4, 7, 16, 18], "account": [1, 14], "accur": 18, "accuraci": 10, "achiev": 17, "act": 1, "action": 1, "activ": 4, "ad": [2, 8, 9], "adapt": 1, "add": [9, 10, 14, 18], "add_hook": 18, "add_label": 10, "addit": [2, 3, 7, 15, 18], "addition": [2, 18], "address": [1, 7], "adjust": 9, "advanc": 1, "advantag": 17, "advis": 2, "aesthet": [4, 6], "affect": 1, "after": [14, 18], "ag": 1, "again": 8, "aggreg": [10, 16], "aggress": 1, "align": [1, 7, 9], "all": [1, 2, 5, 6, 7, 9, 10, 15, 16, 18], "allow": [1, 17], "along": 18, "alreadi": [2, 17], "also": [1, 8, 14, 15, 16, 18], "alwai": 16, "an": [1, 2, 4, 6, 7, 8, 10, 15, 17, 18], "analysi": [7, 15], "ancient_greek": 6, "angl": [7, 9], "ani": [1, 6, 7, 8, 9, 10, 17, 18], "annot": 6, "anot": 16, "anoth": [8, 12, 16], "answer": 1, "anyascii": 10, "anyon": 4, "anyth": 15, "api": [2, 4], "apolog": 1, "apologi": 1, "app": 2, "appear": 1, "appli": [1, 6, 9], "applic": [4, 8], "appoint": 1, "appreci": 14, "appropri": [1, 2, 18], "ar": [1, 2, 3, 5, 6, 7, 9, 10, 11, 15, 16, 18], "arab": 6, "arabic_diacrit": 6, "arabic_lett": 6, "arabic_punctu": 6, "arbitrarili": [4, 8], "arch": [8, 14], "architectur": [4, 8, 14, 15], "area": 18, "argument": [6, 7, 8, 10, 12, 18], "around": 1, "arrai": [7, 9, 10], "art": [4, 15], "artefact": [10, 15, 18], "artefact_typ": 7, "artifici": [4, 6], "arxiv": [6, 8], "asarrai": 10, "ascii_lett": 6, "aspect": [4, 8, 9, 18], "assess": 10, "assign": 10, "associ": 7, "assum": 8, "assume_straight_pag": [8, 12, 18], "astyp": [8, 10, 18], "attack": 1, "attend": [4, 8], "attent": [1, 8], "autom": 4, "automat": 18, "autoregress": [4, 8], "avail": [1, 4, 5, 9], "averag": [9, 18], "avoid": [1, 3], "aw": [4, 18], "awar": 18, "azur": 18, "b": [8, 10, 18], "b_j": 10, "back": 2, "backbon": 8, "backend": 18, "background": 16, "bangla": 6, "bar": 15, "bar_cod": 16, "base": [4, 8, 15], "baselin": [4, 8, 18], "batch": [6, 8, 9, 15, 16, 18], "batch_siz": [6, 12, 15, 16, 17], "bblanchon": 3, "bbox": 18, "becaus": 13, "been": [2, 10, 16, 18], "befor": [6, 8, 9, 18], "begin": 10, "behavior": [1, 18], "being": [10, 18], "belong": 18, "benchmark": 18, "best": 1, "better": [11, 18], "between": [9, 10, 18], "bgr": 7, "bilinear": 9, "bin_thresh": 18, "binar": [4, 8, 18], "binari": [7, 17, 18], "bit": 17, "block": [10, 18], "block_1_1": 18, "blur": 9, "bmvc": 6, "bn": 14, "bodi": [1, 18], "bool": [6, 7, 8, 9, 10], "boolean": [8, 18], "both": [4, 6, 9, 16, 18], "bottom": [8, 18], "bound": [6, 7, 8, 9, 10, 15, 16, 18], "box": [6, 7, 8, 9, 10, 15, 16, 18], "box_thresh": 18, "bright": 9, "browser": [2, 4], "build": [2, 3, 17], "built": 2, "byte": [7, 18], "c": [3, 7, 10], "c_j": 10, "cach": [2, 6, 13], "cache_sampl": 6, "call": 17, "callabl": [6, 9], "can": [2, 3, 12, 13, 14, 15, 16, 18], "capabl": [2, 11, 18], "case": [6, 10], "cf": 18, "cfg": 18, "challeng": 6, "challenge2_test_task12_imag": 6, "challenge2_test_task1_gt": 6, "challenge2_training_task12_imag": 6, "challenge2_training_task1_gt": 6, "chang": [13, 18], "channel": [1, 2, 7, 9], "channel_prior": 3, "channelshuffl": 9, "charact": [4, 6, 7, 10, 16, 18], "charactergener": [6, 16], "characterist": 1, "charg": 18, "charset": 18, "chart": 7, "check": [2, 14, 18], "checkpoint": 8, "chip": 3, "ci": 2, "clarifi": 1, "clariti": 1, "class": [1, 6, 7, 9, 10, 18], "class_nam": 12, "classif": [16, 18], "classmethod": 7, "clear": 2, "clone": 3, "close": 2, "co": 14, "code": [4, 7, 15], "codecov": 2, "colab": 11, "collate_fn": 6, "collect": [7, 15], "color": 9, "colorinvers": 9, "column": 7, "com": [1, 3, 7, 8, 14], "combin": 18, "command": [2, 15], "comment": 1, "commit": 1, "common": [1, 9, 10, 17], "commun": 1, "compar": 4, "comparison": [10, 18], "competit": 6, "compil": [11, 18], "complaint": 1, "complementari": 10, "complet": 2, "compon": 18, "compos": [6, 18], "comprehens": 18, "comput": [6, 10, 17, 18], "conf_threshold": 15, "confid": [7, 18], "config": [3, 8], "configur": 8, "confus": 10, "consecut": [9, 18], "consequ": 1, "consid": [1, 2, 6, 7, 10, 18], "consist": 18, "consolid": [4, 6], "constant": 9, "construct": 1, "contact": 1, "contain": [5, 6, 11, 16, 18], "content": [6, 7, 18], "context": 8, "contib": 3, "continu": 1, "contrast": 9, "contrast_factor": 9, "contrib": [3, 15], "contribut": 1, "contributor": 2, "convers": 7, "convert": [7, 9], "convolut": 8, "coordin": [7, 18], "cord": [4, 6, 16, 18], "core": [10, 18], "corner": 18, "correct": 9, "correspond": [3, 7, 9, 18], "could": [1, 15], "counterpart": 10, "cover": 2, "coverag": 2, "cpu": [4, 12, 17], "creat": 14, "crnn": [4, 8, 14], "crnn_mobilenet_v3_larg": [8, 14, 18], "crnn_mobilenet_v3_smal": [8, 17, 18], "crnn_vgg16_bn": [8, 12, 14, 18], "crop": [7, 8, 9, 12, 16, 18], "crop_orient": [7, 18], "crop_orientation_predictor": [8, 12], "crop_param": 12, "cuda": 17, "currenc": 6, "current": [2, 12, 18], "custom": [14, 15, 17, 18], "custom_crop_orientation_model": 12, "custom_page_orientation_model": 12, "customhook": 18, "cvit": 4, "czczup": 8, "czech": 6, "d": [6, 16], "danish": 6, "data": [4, 6, 7, 9, 10, 12, 14], "dataload": 16, "dataset": [8, 12, 18], "dataset_info": 6, "date": [12, 18], "db": 14, "db_mobilenet_v3_larg": [8, 14, 18], "db_resnet34": 18, "db_resnet50": [8, 12, 14, 18], "dbnet": [4, 8], "deal": [11, 18], "decis": 1, "decod": 7, "decode_img_as_tensor": 7, "dedic": 17, "deem": 1, "deep": [8, 18], "def": 18, "default": [3, 7, 12, 13, 18], "defer": 16, "defin": [10, 17], "degre": [7, 9, 18], "degress": 7, "delet": 2, "delimit": 18, "delta": 9, "demo": [2, 4], "demonstr": 1, "depend": [2, 3, 4, 18], "deploi": 2, "deploy": 4, "derogatori": 1, "describ": 8, "descript": 11, "design": 9, "desir": 7, "det_arch": [8, 12, 14, 17], "det_b": 18, "det_model": [12, 14, 17], "det_param": 12, "det_predictor": [12, 18], "detail": [12, 18], "detect": [6, 7, 10, 11, 12, 15], "detect_languag": 8, "detect_orient": [8, 12, 18], "detection_predictor": [8, 18], "detection_task": [6, 16], "detectiondataset": [6, 16], "detectionmetr": 10, "detectionpredictor": [8, 12], "detector": [4, 8, 15], "deterior": 8, "determin": 1, "dev": [2, 13], "develop": 3, "deviat": 9, "devic": 17, "dict": [7, 10, 18], "dictionari": [7, 10], "differ": 1, "differenti": [4, 8], "digit": [4, 6, 16], "dimens": [7, 10, 18], "dimension": 9, "direct": 6, "directli": [14, 18], "directori": [2, 13], "disabl": [1, 13, 18], "disable_crop_orient": 18, "disable_page_orient": 18, "disclaim": 18, "discuss": 2, "disparag": 1, "displai": [7, 10], "display_artefact": 10, "distribut": 9, "div": 18, "divers": 1, "divid": 7, "do": [2, 3, 8], "doc": [2, 7, 15, 17, 18], "docartefact": [6, 16], "docstr": 2, "doctr": [3, 12, 13, 14, 15, 16, 17, 18], "doctr_cache_dir": 13, "doctr_multiprocessing_dis": 13, "document": [6, 8, 10, 11, 12, 15, 16, 17, 18], "documentbuild": 18, "documentfil": [7, 12, 14, 15, 17], "doesn": 17, "don": [12, 18], "done": 9, "download": [6, 16], "downsiz": 8, "draw": 9, "drop": 6, "drop_last": 6, "dtype": [7, 8, 9, 10, 17], "dual": [4, 6], "dummi": 14, "dummy_img": 18, "dummy_input": 17, "dure": 1, "dutch": 6, "dynam": [6, 15], "dynamic_seq_length": 6, "e": [1, 2, 3, 7, 8], "each": [4, 6, 7, 8, 9, 10, 16, 18], "eas": 2, "easi": [4, 10, 14, 17], "easili": [7, 10, 12, 14, 16, 18], "econom": 1, "edit": 1, "educ": 1, "effect": 18, "effici": [2, 4, 6, 8], "either": [10, 18], "element": [6, 7, 8, 18], "els": [2, 15], "email": 1, "empathi": 1, "en": 18, "enabl": [6, 7], "enclos": 7, "encod": [4, 6, 7, 8, 18], "encode_sequ": 6, "encount": 2, "encrypt": 7, "end": [4, 6, 8, 10], "english": [6, 16], "enough": [2, 18], "ensur": 2, "entri": 6, "environ": [1, 13], "eo": 6, "equiv": 18, "estim": 8, "etc": [7, 15], "ethnic": 1, "evalu": [16, 18], "event": 1, "everyon": 1, "everyth": [2, 18], "exact": [10, 18], "exampl": [1, 2, 4, 6, 8, 14, 18], "exchang": 17, "execut": 18, "exist": 14, "expand": 9, "expect": [7, 9, 10], "experi": 1, "explan": [1, 18], "explicit": 1, "exploit": [4, 8], "export": [7, 8, 10, 11, 15, 18], "export_as_straight_box": [8, 18], "export_as_xml": 18, "export_model_to_onnx": 17, "express": [1, 9], "extens": 7, "extern": [1, 16], "extract": [4, 6], "extractor": 8, "f_": 10, "f_a": 10, "factor": 9, "fair": 1, "fairli": 1, "fals": [6, 7, 8, 9, 10, 12, 18], "faq": 1, "fascan": 14, "fast": [4, 6, 8], "fast_bas": [8, 18], "fast_smal": [8, 18], "fast_tini": [8, 18], "faster": [4, 8, 17], "fasterrcnn_mobilenet_v3_large_fpn": 8, "favorit": 18, "featur": [3, 8, 10, 11, 12, 15], "feedback": 1, "feel": [2, 14], "felix92": 14, "few": [17, 18], "figsiz": 10, "figur": [10, 15], "file": [2, 6], "final": 8, "find": [2, 16], "finnish": 6, "first": [2, 6], "firsthand": 6, "fit": [8, 18], "flag": 18, "flip": 9, "float": [7, 9, 10, 17], "float32": [7, 8, 9, 17], "fn": 9, "focu": 14, "focus": [1, 6], "folder": 6, "follow": [1, 2, 3, 6, 9, 10, 12, 13, 14, 15, 18], "font": 6, "font_famili": 6, "foral": 10, "forc": 2, "forg": 3, "form": [4, 6, 18], "format": [7, 10, 12, 16, 17, 18], "forpost": [4, 6], "forum": 2, "fp16": 17, "frac": 10, "framework": [3, 14, 16, 18], "free": [1, 2, 14], "french": [6, 12, 14, 18], "friendli": 4, "from": [1, 4, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18], "from_hub": [8, 14], "from_imag": [7, 14, 15, 17], "from_pdf": 7, "from_url": 7, "full": [6, 10, 18], "function": [6, 9, 10, 15], "funsd": [4, 6, 16, 18], "further": 16, "futur": 6, "g": [7, 8], "g_": 10, "g_x": 10, "gamma": 9, "gaussian": 9, "gaussianblur": 9, "gaussiannois": 9, "gen": 18, "gender": 1, "gener": [2, 4, 7, 8], "generic_cyrillic_lett": 6, "geometri": [4, 7, 18], "geq": 10, "german": [6, 12, 14], "get": [17, 18], "git": 14, "github": [2, 3, 8, 14], "give": [1, 15], "given": [6, 7, 9, 10, 18], "global": 8, "go": 18, "good": 17, "googl": 2, "googlevis": 4, "gpu": [4, 15, 17], "gracefulli": 1, "graph": [4, 6, 7], "grayscal": 9, "ground": 10, "groung": 10, "group": [4, 18], "gt": 10, "gt_box": 10, "gt_label": 10, "guid": 2, "guidanc": 16, "gvision": 18, "h": [7, 8, 9], "h_": 10, "ha": [2, 6, 10, 16], "handl": [11, 16, 18], "handwrit": 6, "handwritten": 16, "harass": 1, "hardwar": 18, "harm": 1, "hat": 10, "have": [1, 2, 10, 12, 14, 16, 17, 18], "head": [8, 18], "healthi": 1, "hebrew": 6, "height": [7, 9], "hello": [10, 18], "help": 17, "here": [5, 9, 11, 15, 16, 18], "hf": 8, "hf_hub_download": 8, "high": 7, "higher": [3, 6, 18], "hindi": 6, "hindi_digit": 6, "hocr": 18, "hook": 18, "horizont": [7, 9, 18], "hous": 6, "how": [2, 11, 12, 14, 16], "howev": 16, "hsv": 9, "html": [1, 2, 3, 7, 18], "http": [1, 3, 6, 7, 8, 14, 18], "hub": 8, "hue": 9, "huggingfac": 8, "hw": 6, "i": [1, 2, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17], "i7": 18, "ic03": [4, 6, 16], "ic13": [4, 6, 16], "icdar": [4, 6], "icdar2019": 6, "id": 18, "ident": 1, "identifi": 4, "iiit": [4, 6], "iiit5k": [6, 16], "iiithw": [4, 6, 16], "imag": [4, 6, 7, 8, 9, 10, 14, 15, 16, 18], "imagenet": 8, "imageri": 1, "images_90k_norm": 6, "img": [6, 9, 16, 17], "img_cont": 7, "img_fold": [6, 16], "img_path": 7, "img_transform": 6, "imgur5k": [4, 6, 16], "imgur5k_annot": 6, "imlist": 6, "impact": 1, "implement": [6, 7, 8, 9, 10, 18], "import": [6, 7, 8, 9, 10, 12, 14, 15, 16, 17, 18], "improv": 8, "inappropri": 1, "incid": 1, "includ": [1, 6, 16, 17], "inclus": 1, "increas": 9, "independ": 9, "index": [2, 7], "indic": 10, "individu": 1, "infer": [4, 8, 9, 15, 18], "inform": [1, 2, 4, 6, 16], "input": [2, 7, 8, 9, 17, 18], "input_crop": 8, "input_pag": [8, 10, 18], "input_shap": 17, "input_tensor": 8, "inspir": [1, 9], "instal": [14, 15, 17], "instanc": [1, 18], "instanti": [8, 18], "instead": [6, 7, 8], "insult": 1, "int": [6, 7, 9], "int64": 10, "integ": 10, "integr": [4, 14, 16], "intel": 18, "interact": [1, 7, 10], "interfac": [14, 17], "interoper": 17, "interpol": 9, "interpret": [6, 7], "intersect": 10, "invert": 9, "investig": 1, "invis": 1, "involv": [1, 18], "io": [12, 14, 15, 17], "iou": 10, "iou_thresh": 10, "iou_threshold": 15, "irregular": [4, 8, 16], "isn": 6, "issu": [1, 2, 14], "italian": 6, "iter": [6, 9, 16, 18], "its": [7, 8, 9, 10, 16, 18], "itself": [8, 14], "j": 10, "job": 2, "join": 2, "jpeg": 9, "jpegqual": 9, "jpg": [6, 7, 14, 17], "json": [6, 16, 18], "json_output": 18, "jump": 2, "just": 1, "kei": [4, 6], "kera": [8, 17], "kernel": [4, 8, 9], "kernel_shap": 9, "keywoard": 8, "keyword": [6, 7, 8, 10], "kie": [8, 12], "kie_predictor": [8, 12], "kiepredictor": 8, "kind": 1, "know": [2, 17], "kwarg": [6, 7, 8, 10], "l": 10, "l_j": 10, "label": [6, 10, 15, 16], "label_fil": [6, 16], "label_fold": 6, "label_path": [6, 16], "labels_path": [6, 16], "ladder": 1, "lambda": 9, "lambdatransform": 9, "lang": 18, "languag": [1, 4, 6, 7, 8, 14, 18], "larg": [8, 14], "largest": 10, "last": [3, 6], "latenc": 8, "later": 2, "latest": 18, "latin": 6, "layer": 17, "layout": 18, "lead": 1, "leader": 1, "learn": [1, 4, 8, 17, 18], "least": 3, "left": [10, 18], "legacy_french": 6, "length": [6, 18], "less": [17, 18], "level": [1, 6, 10, 18], "leverag": 11, "lf": 14, "librari": [2, 3, 11, 12], "light": 4, "lightweight": 17, "like": 1, "limits_": 10, "line": [4, 8, 10, 18], "line_1_1": 18, "link": 12, "linknet": [4, 8], "linknet_resnet18": [8, 12, 17, 18], "linknet_resnet34": [8, 17, 18], "linknet_resnet50": [8, 18], "list": [6, 7, 9, 10, 14], "ll": 10, "load": [4, 6, 8, 15, 17], "load_state_dict": 12, "load_weight": 12, "loc_pr": 18, "local": [2, 4, 6, 8, 10, 16, 18], "localis": 6, "localizationconfus": 10, "locat": [2, 7, 18], "login": 8, "login_to_hub": [8, 14], "logo": [7, 15, 16], "love": 14, "lower": [9, 10, 18], "m": [2, 10, 18], "m1": 3, "macbook": 3, "machin": 17, "made": 4, "magc_resnet31": 8, "mai": [1, 2], "mail": 1, "main": 11, "maintain": 4, "mainten": 2, "make": [1, 2, 10, 12, 13, 14, 17, 18], "mani": [16, 18], "manipul": 18, "map": [6, 8], "map_loc": 12, "master": [4, 8, 18], "match": [10, 18], "mathcal": 10, "matplotlib": [7, 10], "max": [6, 9, 10], "max_angl": 9, "max_area": 9, "max_char": [6, 16], "max_delta": 9, "max_gain": 9, "max_gamma": 9, "max_qual": 9, "max_ratio": 9, "maximum": [6, 9], "maxval": [8, 9], "mbox": 10, "mean": [9, 10, 12], "meaniou": 10, "meant": [7, 17], "measur": 18, "media": 1, "median": 8, "meet": 12, "member": 1, "memori": [13, 17], "mention": 18, "merg": 6, "messag": 2, "meta": 18, "metadata": 17, "metal": 3, "method": [7, 9, 18], "metric": [10, 18], "middl": 18, "might": [17, 18], "min": 9, "min_area": 9, "min_char": [6, 16], "min_gain": 9, "min_gamma": 9, "min_qual": 9, "min_ratio": 9, "min_val": 9, "minde": [1, 3, 4, 8], "minim": [2, 4], "minimalist": [4, 8], "minimum": [3, 6, 9, 10, 18], "minval": 9, "miss": 3, "mistak": 1, "mixed_float16": 17, "mixed_precis": 17, "mjsynth": [4, 6, 16], "mnt": 6, "mobilenet": [8, 14], "mobilenet_v3_larg": 8, "mobilenet_v3_large_r": 8, "mobilenet_v3_smal": [8, 12], "mobilenet_v3_small_crop_orient": [8, 12], "mobilenet_v3_small_page_orient": [8, 12], "mobilenet_v3_small_r": 8, "mobilenetv3": 8, "modal": [4, 6], "mode": 3, "model": [6, 10, 13, 15, 16], "model_nam": [8, 14, 17], "model_path": [15, 17], "moder": 1, "modif": 2, "modifi": [8, 13, 18], "modul": [3, 7, 8, 9, 10, 18], "more": [2, 16, 18], "most": 18, "mozilla": 1, "multi": [4, 8], "multilingu": [6, 14], "multipl": [6, 7, 9, 18], "multipli": 9, "multiprocess": 13, "my": 8, "my_awesome_model": 14, "my_hook": 18, "n": [6, 10], "name": [6, 8, 17, 18], "nation": 1, "natur": [1, 4, 6], "ndarrai": [6, 7, 9, 10], "necessari": [3, 12, 13], "need": [2, 3, 6, 10, 12, 13, 14, 15, 18], "neg": 9, "nest": 18, "network": [4, 6, 8, 17], "neural": [4, 6, 8, 17], "new": [2, 10], "next": [6, 16], "nois": 9, "noisi": [4, 6], "non": [4, 6, 7, 8, 9, 10], "none": [6, 7, 8, 9, 10, 18], "normal": [8, 9], "norwegian": 6, "note": [0, 2, 6, 8, 12, 14, 15, 17], "now": 2, "np": [8, 9, 10, 18], "num_output_channel": 9, "num_sampl": [6, 16], "number": [6, 9, 10, 18], "numpi": [7, 8, 10, 18], "o": 3, "obb": 15, "obj_detect": 14, "object": [6, 7, 10, 15, 18], "objectness_scor": [7, 18], "oblig": 1, "obtain": 18, "occupi": 17, "ocr": [4, 6, 8, 10, 14], "ocr_carea": 18, "ocr_db_crnn": 10, "ocr_lin": 18, "ocr_pag": 18, "ocr_par": 18, "ocr_predictor": [8, 12, 14, 17, 18], "ocrdataset": [6, 16], "ocrmetr": 10, "ocrpredictor": [8, 12], "ocrx_word": 18, "offens": 1, "offici": [1, 8], "offlin": 1, "offset": 9, "onc": 18, "one": [2, 6, 8, 9, 12, 14, 18], "oneof": 9, "ones": [6, 10], "onli": [2, 8, 9, 10, 12, 14, 16, 17, 18], "onlin": 1, "onnx": 15, "onnxruntim": [15, 17], "onnxtr": 17, "opac": 9, "opacity_rang": 9, "open": [1, 2, 14, 17], "opinion": 1, "optic": [4, 18], "optim": [4, 18], "option": [6, 8, 12], "order": [2, 6, 7, 9], "org": [1, 6, 8, 18], "organ": 7, "orient": [1, 7, 8, 11, 15, 18], "orientationpredictor": 8, "other": [1, 2], "otherwis": [1, 7, 10], "our": [2, 8, 18], "out": [2, 8, 9, 10, 18], "outpout": 18, "output": [7, 9, 17], "output_s": [7, 9], "outsid": 13, "over": [6, 10, 18], "overal": [1, 8], "overlai": 7, "overview": 15, "overwrit": 12, "overwritten": 14, "own": 4, "p": [9, 18], "packag": [2, 4, 10, 13, 15, 16, 17], "pad": [6, 8, 9, 18], "page": [3, 6, 8, 10, 12, 18], "page1": 7, "page2": 7, "page_1": 18, "page_idx": [7, 18], "page_orientation_predictor": [8, 12], "page_param": 12, "pair": 10, "paper": 8, "par_1_1": 18, "paragraph": 18, "paragraph_break": 18, "param": [9, 18], "paramet": [4, 7, 8, 17], "pars": [4, 6], "parseq": [4, 8, 14, 17, 18], "part": [6, 9, 18], "parti": 3, "partial": 18, "particip": 1, "pass": [6, 7, 8, 12, 18], "password": 7, "patch": [8, 10], "path": [6, 7, 15, 16, 17], "path_to_checkpoint": 12, "path_to_custom_model": 17, "path_to_pt": 12, "pattern": 1, "pdf": [7, 8, 11], "pdfpage": 7, "peopl": 1, "per": [9, 18], "perform": [4, 7, 8, 9, 10, 13, 17, 18], "period": 1, "permiss": 1, "permut": [4, 8], "persian_lett": 6, "person": [1, 16], "phase": 18, "photo": 16, "physic": [1, 7], "pick": 9, "pictur": 7, "pip": [2, 3, 15, 17], "pipelin": 18, "pixel": [7, 9, 18], "pleas": 2, "plot": 10, "plt": 10, "plug": 14, "plugin": 3, "png": 7, "point": 17, "polici": 13, "polish": 6, "polit": 1, "polygon": [6, 10, 18], "pool": 8, "portugues": 6, "posit": [1, 10], "possibl": [2, 10, 14, 18], "post": [1, 18], "postprocessor": 18, "potenti": 8, "power": 4, "ppageno": 18, "pre": [2, 8, 17], "precis": [10, 18], "pred": 10, "pred_box": 10, "pred_label": 10, "predefin": 16, "predict": [7, 8, 10, 18], "predictor": [4, 7, 8, 11, 12, 14, 17], "prefer": 16, "preinstal": 3, "preprocessor": [12, 18], "prerequisit": 14, "present": 11, "preserv": [8, 9, 18], "preserve_aspect_ratio": [7, 8, 9, 12, 18], "pretrain": [4, 8, 10, 12, 17, 18], "pretrained_backbon": [8, 12], "print": 18, "prior": 6, "privaci": 1, "privat": 1, "probabl": 9, "problem": 2, "procedur": 9, "process": [2, 4, 7, 12, 18], "processor": 18, "produc": [11, 18], "product": 17, "profession": 1, "project": [2, 16], "promptli": 1, "proper": 2, "properli": 6, "provid": [1, 2, 4, 14, 15, 16, 18], "public": [1, 4], "publicli": 18, "publish": 1, "pull": 14, "punctuat": 6, "pure": 6, "purpos": 2, "push_to_hf_hub": [8, 14], "py": 14, "pypdfium2": [3, 7], "pyplot": [7, 10], "python": [2, 15], "python3": 14, "pytorch": [3, 4, 8, 9, 12, 14, 17, 18], "q": 2, "qr": [7, 15], "qr_code": 16, "qualiti": 9, "question": 1, "quickli": 4, "quicktour": 11, "r": 18, "race": 1, "ramdisk": 6, "rand": [8, 9, 10, 17, 18], "random": [8, 9, 10, 18], "randomappli": 9, "randombright": 9, "randomcontrast": 9, "randomcrop": 9, "randomgamma": 9, "randomhorizontalflip": 9, "randomhu": 9, "randomjpegqu": 9, "randomli": 9, "randomres": 9, "randomrot": 9, "randomsatur": 9, "randomshadow": 9, "rang": 9, "rassi": 14, "ratio": [8, 9, 18], "raw": [7, 10], "re": 17, "read": [4, 6, 8], "read_html": 7, "read_img_as_numpi": 7, "read_img_as_tensor": 7, "read_pdf": 7, "readi": 17, "real": [4, 8, 9], "reason": [1, 4, 6], "rebuild": 2, "rebuilt": 2, "recal": [10, 18], "receipt": [4, 6, 18], "reco_arch": [8, 12, 14, 17], "reco_b": 18, "reco_model": [12, 14, 17], "reco_param": 12, "reco_predictor": 12, "recogn": 18, "recognit": [6, 10, 11, 12], "recognition_predictor": [8, 18], "recognition_task": [6, 16], "recognitiondataset": [6, 16], "recognitionpredictor": [8, 12], "rectangular": 8, "reduc": [3, 9], "refer": [2, 3, 12, 14, 15, 16, 18], "regardless": 1, "region": 18, "regroup": 10, "regular": 16, "reject": 1, "rel": [7, 9, 10, 18], "relat": 7, "releas": [0, 3], "relev": 15, "religion": 1, "remov": 1, "render": [7, 18], "repo": 8, "repo_id": [8, 14], "report": 1, "repositori": [6, 8, 14], "repres": [1, 17, 18], "represent": [4, 8], "request": [1, 14], "requir": [3, 9, 17], "research": 4, "residu": 8, "resiz": [9, 18], "resnet": 8, "resnet18": [8, 14], "resnet31": 8, "resnet34": 8, "resnet50": [8, 14], "resolv": 7, "resolve_block": 18, "resolve_lin": 18, "resourc": 16, "respect": 1, "rest": [2, 9, 10], "restrict": 13, "result": [2, 6, 7, 11, 14, 17, 18], "return": 18, "reusabl": 18, "review": 1, "rgb": [7, 9], "rgb_mode": 7, "rgb_output": 7, "right": [1, 8, 10], "robust": [4, 6], "root": 6, "rotat": [6, 7, 8, 9, 10, 11, 12, 16, 18], "run": [2, 3, 8], "same": [2, 7, 10, 16, 17, 18], "sampl": [6, 16, 18], "sample_transform": 6, "sar": [4, 8], "sar_resnet31": [8, 18], "satur": 9, "save": [8, 16], "scale": [7, 8, 9, 10], "scale_rang": 9, "scan": [4, 6], "scene": [4, 6, 8], "score": [7, 10], "script": [2, 16], "seamless": 4, "seamlessli": [4, 18], "search": 8, "searchabl": 11, "sec": 18, "second": 18, "section": [12, 14, 15, 17, 18], "secur": [1, 13], "see": [1, 2], "seen": 18, "segment": [4, 8, 18], "self": 18, "semant": [4, 8], "send": 18, "sens": 10, "sensit": 16, "separ": 18, "sequenc": [4, 6, 7, 8, 10, 18], "sequenti": [9, 18], "seri": 1, "seriou": 1, "set": [1, 3, 6, 8, 10, 13, 15, 18], "set_global_polici": 17, "sever": [7, 9, 18], "sex": 1, "sexual": 1, "shade": 9, "shape": [4, 7, 8, 9, 10, 18], "share": [13, 16], "shift": 9, "shm": 13, "should": [2, 6, 7, 9, 10], "show": [4, 7, 8, 10, 12, 14, 15], "showcas": [2, 11], "shuffl": [6, 9], "side": 10, "signatur": 7, "signific": 16, "simpl": [4, 8, 17], "simpler": 8, "sinc": [6, 16], "singl": [1, 2, 4, 6], "single_img_doc": 17, "size": [1, 6, 7, 9, 15, 18], "skew": 18, "slack": 2, "slightli": 8, "small": [2, 8, 18], "smallest": 7, "snapshot_download": 8, "snippet": 18, "so": [2, 3, 6, 8, 14, 16], "social": 1, "socio": 1, "some": [3, 11, 14, 16], "someth": 2, "somewher": 2, "sort": 1, "sourc": [6, 7, 8, 9, 10, 14], "space": [1, 18], "span": 18, "spanish": 6, "spatial": [4, 6, 7], "specif": [2, 3, 10, 12, 16, 18], "specifi": [1, 6, 7], "speed": [4, 8, 18], "sphinx": 2, "sroie": [4, 6, 16], "stabl": 3, "stackoverflow": 2, "stage": 4, "standalon": 11, "standard": 9, "start": 6, "state": [4, 10, 15], "static": 10, "statu": 1, "std": [9, 12], "step": 13, "still": 18, "str": [6, 7, 8, 9, 10], "straight": [6, 8, 16, 18], "straighten": 18, "straighten_pag": [8, 12, 18], "straigten_pag": 12, "stream": 7, "street": [4, 6], "strict": 3, "strictli": 10, "string": [6, 7, 10, 18], "strive": 3, "strong": [4, 8], "structur": [17, 18], "subset": [6, 18], "suggest": [2, 14], "sum": 10, "summari": 10, "support": [3, 12, 15, 17, 18], "sustain": 1, "svhn": [4, 6, 16], "svt": [6, 16], "swedish": 6, "symmetr": [8, 9, 18], "symmetric_pad": [8, 9, 18], "synthet": 4, "synthtext": [4, 6, 16], "system": 18, "t": [2, 6, 12, 17, 18], "tabl": [14, 15, 16], "take": [1, 6, 18], "target": [6, 7, 9, 10, 16], "target_s": 6, "task": [4, 6, 8, 14, 16, 18], "task2": 6, "team": 3, "techminde": 3, "templat": [2, 4], "tensor": [6, 7, 9, 18], "tensorflow": [3, 4, 7, 8, 9, 12, 14, 17, 18], "tensorspec": 17, "term": 1, "test": [6, 16], "test_set": 6, "text": [6, 7, 8, 10, 16], "text_output": 18, "textmatch": 10, "textnet": 8, "textnet_bas": 8, "textnet_smal": 8, "textnet_tini": 8, "textract": [4, 18], "textstylebrush": [4, 6], "textual": [4, 6, 7, 8, 18], "tf": [3, 7, 8, 9, 14, 17], "than": [2, 10, 14], "thank": 2, "thei": [1, 10], "them": [6, 18], "thi": [1, 2, 3, 5, 6, 9, 10, 12, 13, 14, 16, 17, 18], "thing": [17, 18], "third": 3, "those": [1, 7, 18], "threaten": 1, "threshold": 18, "through": [1, 9, 15, 16], "tilman": 14, "time": [1, 4, 8, 10, 16], "tini": 8, "titl": [7, 18], "tm": 18, "tmp": 13, "togeth": [2, 7], "tograi": 9, "tool": 16, "top": [10, 17, 18], "topic": 2, "torch": [3, 9, 12, 14, 17], "torchvis": 9, "total": 12, "toward": [1, 3], "train": [2, 6, 8, 9, 14, 15, 16, 17, 18], "train_it": [6, 16], "train_load": [6, 16], "train_pytorch": 14, "train_set": [6, 16], "train_tensorflow": 14, "trainabl": [4, 8], "tranform": 9, "transcrib": 18, "transfer": [4, 6], "transfo": 9, "transform": [4, 6, 8], "translat": 1, "troll": 1, "true": [6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18], "truth": 10, "tune": 17, "tupl": [6, 7, 9, 10], "two": [7, 13], "txt": 6, "type": [7, 10, 14, 17, 18], "typic": 18, "u": [1, 2], "ucsd": 6, "udac": 2, "uint8": [7, 8, 10, 18], "ukrainian": 6, "unaccept": 1, "underli": [16, 18], "underneath": 7, "understand": [4, 6, 18], "uniform": [8, 9], "uniformli": 9, "uninterrupt": [7, 18], "union": 10, "unittest": 2, "unlock": 7, "unoffici": 8, "unprofession": 1, "unsolicit": 1, "unsupervis": 4, "unwelcom": 1, "up": [8, 18], "updat": 10, "upgrad": 2, "upper": [6, 9], "uppercas": 16, "url": 7, "us": [1, 2, 3, 6, 8, 10, 11, 12, 13, 14, 15, 18], "usabl": 18, "usag": [13, 17], "use_polygon": [6, 10, 16], "useabl": 18, "user": [4, 7, 11], "utf": 18, "util": 17, "v1": 14, "v3": [8, 14, 18], "valid": 16, "valu": [2, 7, 9, 18], "valuabl": 4, "variabl": 13, "varieti": 6, "veri": 8, "version": [1, 2, 3, 17, 18], "vgg": 8, "vgg16": 14, "vgg16_bn_r": 8, "via": 1, "vietnames": 6, "view": [4, 6], "viewpoint": 1, "violat": 1, "visibl": 1, "vision": [4, 6, 8], "visiondataset": 6, "visiontransform": 8, "visual": [3, 4, 15], "visualize_pag": 10, "vit_": 8, "vit_b": 8, "vitstr": [4, 8, 17], "vitstr_bas": [8, 18], "vitstr_smal": [8, 12, 17, 18], "viz": 3, "vocab": [12, 14, 16, 17, 18], "vocabulari": [6, 12, 14], "w": [7, 8, 9, 10], "w3": 18, "wa": 1, "wai": [1, 4, 16], "want": [2, 17, 18], "warmup": 18, "wasn": 2, "we": [1, 2, 3, 4, 7, 9, 12, 14, 16, 17, 18], "weasyprint": 7, "web": [2, 7], "websit": 6, "welcom": 1, "well": [1, 17], "were": [1, 7, 18], "what": 1, "when": [1, 2, 8], "whenev": 2, "where": [2, 7, 9, 10], "whether": [2, 6, 7, 9, 10, 16, 18], "which": [1, 8, 13, 15, 16, 18], "whichev": 3, "while": [9, 18], "why": 1, "width": [7, 9], "wiki": 1, "wildreceipt": [4, 6, 16], "window": [8, 10], "wish": 2, "within": 1, "without": [1, 6, 8], "wonder": 2, "word": [4, 6, 8, 10, 18], "word_1_1": 18, "word_1_2": 18, "word_1_3": 18, "wordgener": [6, 16], "words_onli": 10, "work": [12, 13, 18], "workflow": 2, "worklow": 2, "world": [10, 18], "worth": 8, "wrap": 18, "wrapper": [6, 9], "write": 13, "written": [1, 7], "www": [1, 7, 18], "x": [7, 9, 10], "x_ascend": 18, "x_descend": 18, "x_i": 10, "x_size": 18, "x_wconf": 18, "xhtml": 18, "xmax": 7, "xmin": 7, "xml": 18, "xml_bytes_str": 18, "xml_element": 18, "xml_output": 18, "xmln": 18, "y": 10, "y_i": 10, "y_j": 10, "yet": 15, "ymax": 7, "ymin": 7, "yolov8": 15, "you": [2, 3, 6, 7, 8, 12, 13, 14, 15, 16, 17, 18], "your": [2, 4, 7, 10, 18], "yoursit": 7, "zero": [9, 10], "zoo": 12, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 6, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 6, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 6, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 6, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 6, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 6, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 6, "\u00e4\u00f6\u00e4\u00f6": 6, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 6, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 6, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 6, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 6, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 6, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 6, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0905": 6, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 6, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 6, "\u0950": 6, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 6, "\u09bd": 6, "\u09ce": 6, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 6}, "titles": ["Changelog", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 2, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 1], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 1], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 1], "31": 0, "4": [0, 1], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 18, "approach": 18, "architectur": 18, "arg": [6, 7, 8, 9, 10], "artefact": 7, "artefactdetect": 15, "attribut": 1, "avail": [15, 16, 18], "aw": 13, "ban": 1, "block": 7, "bug": 2, "changelog": 0, "choos": [16, 18], "classif": [8, 12, 14], "code": [1, 2], "codebas": 2, "commit": 2, "commun": 14, "compos": 9, "conda": 3, "conduct": 1, "connect": 2, "continu": 2, "contrib": 5, "contribut": [2, 5, 15], "contributor": 1, "convent": 14, "correct": 1, "coven": 1, "custom": [6, 12], "data": 16, "dataload": 6, "dataset": [4, 6, 16], "detect": [4, 8, 14, 16, 18], "develop": 2, "do": 18, "doctr": [2, 4, 5, 6, 7, 8, 9, 10, 11], "document": [2, 4, 7], "end": 18, "enforc": 1, "evalu": 10, "export": 17, "factori": 8, "featur": [2, 4], "feedback": 2, "file": 7, "from": 14, "gener": [6, 16], "git": 3, "guidelin": 1, "half": 17, "hub": 14, "huggingfac": 14, "i": 18, "infer": 17, "instal": [2, 3], "integr": [2, 15], "io": 7, "lambda": 13, "let": 2, "line": 7, "linux": 3, "load": [12, 14, 16], "loader": 6, "main": 4, "mode": 2, "model": [4, 8, 12, 14, 17, 18], "modifi": 2, "modul": [5, 15], "name": 14, "notebook": 11, "object": 16, "ocr": [16, 18], "onli": 3, "onnx": 17, "optim": 17, "option": 18, "orient": 12, "our": 1, "output": 18, "own": [12, 16], "packag": 3, "page": 7, "perman": 1, "pipelin": 15, "pledg": 1, "precis": 17, "predictor": 18, "prepar": 17, "prerequisit": 3, "pretrain": 14, "push": 14, "python": 3, "qualiti": 2, "question": 2, "read": 7, "readi": 16, "recognit": [4, 8, 14, 16, 18], "report": 2, "request": 2, "respons": 1, "return": [6, 7, 8, 10], "right": 18, "scope": 1, "share": 14, "should": 18, "stage": 18, "standard": 1, "structur": [2, 7], "style": 2, "support": [4, 5, 6, 9], "synthet": [6, 16], "task": 10, "temporari": 1, "test": 2, "text": [4, 18], "train": 12, "transform": 9, "two": 18, "unit": 2, "us": [16, 17], "util": 10, "v0": 0, "verif": 2, "via": 3, "visual": 10, "vocab": 6, "warn": 1, "what": 18, "word": 7, "your": [12, 14, 15, 16, 17], "zoo": [4, 8]}})
\ No newline at end of file
diff --git a/using_doctr/custom_models_training.html b/using_doctr/custom_models_training.html
index 580b4368b7..e664c6a950 100644
--- a/using_doctr/custom_models_training.html
+++ b/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -615,7 +615,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/using_doctr/running_on_aws.html b/using_doctr/running_on_aws.html
index ddb0c3c80f..81c38b49f5 100644
--- a/using_doctr/running_on_aws.html
+++ b/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -358,7 +358,7 @@ AWS Lambda
-
+
diff --git a/using_doctr/sharing_models.html b/using_doctr/sharing_models.html
index 07a3b2f2a3..4f5d1d68a5 100644
--- a/using_doctr/sharing_models.html
+++ b/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -540,7 +540,7 @@ Recognition
-
+
diff --git a/using_doctr/using_contrib_modules.html b/using_doctr/using_contrib_modules.html
index b4a10925e6..cf282ff3a4 100644
--- a/using_doctr/using_contrib_modules.html
+++ b/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -411,7 +411,7 @@ ArtefactDetection
-
+
diff --git a/using_doctr/using_datasets.html b/using_doctr/using_datasets.html
index 4a52df36ba..e30b6d6459 100644
--- a/using_doctr/using_datasets.html
+++ b/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -638,7 +638,7 @@ Data Loading
-
+
diff --git a/using_doctr/using_model_export.html b/using_doctr/using_model_export.html
index 2b30ee63a1..ad9d09ed4c 100644
--- a/using_doctr/using_model_export.html
+++ b/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -463,7 +463,7 @@ Using your ONNX exported model
-
+
diff --git a/using_doctr/using_models.html b/using_doctr/using_models.html
index 13cb06116b..5c80dbf62d 100644
--- a/using_doctr/using_models.html
+++ b/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1249,7 +1249,7 @@ Advanced options
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/cord.html b/v0.1.0/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.0/_modules/doctr/datasets/cord.html
+++ b/v0.1.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/detection.html b/v0.1.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.0/_modules/doctr/datasets/detection.html
+++ b/v0.1.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/doc_artefacts.html b/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/funsd.html b/v0.1.0/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.0/_modules/doctr/datasets/funsd.html
+++ b/v0.1.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ic03.html b/v0.1.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.0/_modules/doctr/datasets/ic03.html
+++ b/v0.1.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ic13.html b/v0.1.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.0/_modules/doctr/datasets/ic13.html
+++ b/v0.1.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/iiit5k.html b/v0.1.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/iiithws.html b/v0.1.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/imgur5k.html b/v0.1.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/loader.html b/v0.1.0/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.0/_modules/doctr/datasets/loader.html
+++ b/v0.1.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/mjsynth.html b/v0.1.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ocr.html b/v0.1.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.0/_modules/doctr/datasets/ocr.html
+++ b/v0.1.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/recognition.html b/v0.1.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.0/_modules/doctr/datasets/recognition.html
+++ b/v0.1.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/sroie.html b/v0.1.0/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.0/_modules/doctr/datasets/sroie.html
+++ b/v0.1.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/svhn.html b/v0.1.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.0/_modules/doctr/datasets/svhn.html
+++ b/v0.1.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/svt.html b/v0.1.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.0/_modules/doctr/datasets/svt.html
+++ b/v0.1.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/synthtext.html b/v0.1.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/utils.html b/v0.1.0/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.0/_modules/doctr/datasets/utils.html
+++ b/v0.1.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/wildreceipt.html b/v0.1.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.0/_modules/doctr/io/elements.html b/v0.1.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.0/_modules/doctr/io/elements.html
+++ b/v0.1.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.0/_modules/doctr/io/html.html b/v0.1.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.0/_modules/doctr/io/html.html
+++ b/v0.1.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.0/_modules/doctr/io/image/base.html b/v0.1.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.0/_modules/doctr/io/image/base.html
+++ b/v0.1.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.0/_modules/doctr/io/image/tensorflow.html b/v0.1.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/io/pdf.html b/v0.1.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.0/_modules/doctr/io/pdf.html
+++ b/v0.1.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.0/_modules/doctr/io/reader.html b/v0.1.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.0/_modules/doctr/io/reader.html
+++ b/v0.1.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/zoo.html b/v0.1.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/zoo.html b/v0.1.0/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/models/factory/hub.html b/v0.1.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.0/_modules/doctr/models/factory/hub.html
+++ b/v0.1.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/zoo.html b/v0.1.0/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/models/zoo.html b/v0.1.0/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.0/_modules/doctr/models/zoo.html
+++ b/v0.1.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/transforms/modules/base.html b/v0.1.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/utils/metrics.html b/v0.1.0/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.0/_modules/doctr/utils/metrics.html
+++ b/v0.1.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.0/_modules/doctr/utils/visualization.html b/v0.1.0/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.0/_modules/doctr/utils/visualization.html
+++ b/v0.1.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.0/_modules/index.html b/v0.1.0/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.0/_modules/index.html
+++ b/v0.1.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.0/_sources/getting_started/installing.rst.txt b/v0.1.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.0/_sources/getting_started/installing.rst.txt
+++ b/v0.1.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.0/_static/basic.css b/v0.1.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.0/_static/basic.css
+++ b/v0.1.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.0/_static/doctools.js b/v0.1.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.0/_static/doctools.js
+++ b/v0.1.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.0/_static/language_data.js b/v0.1.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.0/_static/language_data.js
+++ b/v0.1.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.0/_static/searchtools.js b/v0.1.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.0/_static/searchtools.js
+++ b/v0.1.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.0/changelog.html b/v0.1.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.0/changelog.html
+++ b/v0.1.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.0/community/resources.html b/v0.1.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.0/community/resources.html
+++ b/v0.1.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.0/contributing/code_of_conduct.html b/v0.1.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.0/contributing/code_of_conduct.html
+++ b/v0.1.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.0/contributing/contributing.html b/v0.1.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.0/contributing/contributing.html
+++ b/v0.1.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.0/genindex.html b/v0.1.0/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.0/genindex.html
+++ b/v0.1.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.0/getting_started/installing.html b/v0.1.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.0/getting_started/installing.html
+++ b/v0.1.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.0/index.html b/v0.1.0/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.0/index.html
+++ b/v0.1.0/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.0/modules/contrib.html b/v0.1.0/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.0/modules/contrib.html
+++ b/v0.1.0/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.0/modules/datasets.html b/v0.1.0/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.0/modules/datasets.html
+++ b/v0.1.0/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.0/modules/io.html b/v0.1.0/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.0/modules/io.html
+++ b/v0.1.0/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.0/modules/models.html b/v0.1.0/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.0/modules/models.html
+++ b/v0.1.0/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.0/modules/transforms.html b/v0.1.0/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.0/modules/transforms.html
+++ b/v0.1.0/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.0/modules/utils.html b/v0.1.0/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.0/modules/utils.html
+++ b/v0.1.0/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.0/notebooks.html b/v0.1.0/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.0/notebooks.html
+++ b/v0.1.0/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.0/search.html b/v0.1.0/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.0/search.html
+++ b/v0.1.0/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.0/searchindex.js b/v0.1.0/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.0/searchindex.js
+++ b/v0.1.0/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.0/using_doctr/custom_models_training.html b/v0.1.0/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.0/using_doctr/custom_models_training.html
+++ b/v0.1.0/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.0/using_doctr/running_on_aws.html b/v0.1.0/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.0/using_doctr/running_on_aws.html
+++ b/v0.1.0/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.0/using_doctr/sharing_models.html b/v0.1.0/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.0/using_doctr/sharing_models.html
+++ b/v0.1.0/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.0/using_doctr/using_contrib_modules.html b/v0.1.0/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.0/using_doctr/using_contrib_modules.html
+++ b/v0.1.0/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.0/using_doctr/using_datasets.html b/v0.1.0/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.0/using_doctr/using_datasets.html
+++ b/v0.1.0/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.0/using_doctr/using_model_export.html b/v0.1.0/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.0/using_doctr/using_model_export.html
+++ b/v0.1.0/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.0/using_doctr/using_models.html b/v0.1.0/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.0/using_doctr/using_models.html
+++ b/v0.1.0/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/cord.html b/v0.1.1/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.1/_modules/doctr/datasets/cord.html
+++ b/v0.1.1/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/detection.html b/v0.1.1/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.1/_modules/doctr/datasets/detection.html
+++ b/v0.1.1/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/funsd.html b/v0.1.1/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.1/_modules/doctr/datasets/funsd.html
+++ b/v0.1.1/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic03.html b/v0.1.1/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.1/_modules/doctr/datasets/ic03.html
+++ b/v0.1.1/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic13.html b/v0.1.1/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.1/_modules/doctr/datasets/ic13.html
+++ b/v0.1.1/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiit5k.html b/v0.1.1/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.1/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.1/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiithws.html b/v0.1.1/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.1/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.1/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/imgur5k.html b/v0.1.1/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.1/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.1/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/loader.html b/v0.1.1/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.1/_modules/doctr/datasets/loader.html
+++ b/v0.1.1/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/mjsynth.html b/v0.1.1/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.1/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.1/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ocr.html b/v0.1.1/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.1/_modules/doctr/datasets/ocr.html
+++ b/v0.1.1/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/recognition.html b/v0.1.1/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.1/_modules/doctr/datasets/recognition.html
+++ b/v0.1.1/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/sroie.html b/v0.1.1/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.1/_modules/doctr/datasets/sroie.html
+++ b/v0.1.1/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svhn.html b/v0.1.1/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.1/_modules/doctr/datasets/svhn.html
+++ b/v0.1.1/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svt.html b/v0.1.1/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.1/_modules/doctr/datasets/svt.html
+++ b/v0.1.1/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/synthtext.html b/v0.1.1/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.1/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.1/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/utils.html b/v0.1.1/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.1/_modules/doctr/datasets/utils.html
+++ b/v0.1.1/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/wildreceipt.html b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.1/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.1/_modules/doctr/io/elements.html b/v0.1.1/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.1/_modules/doctr/io/elements.html
+++ b/v0.1.1/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.1/_modules/doctr/io/html.html b/v0.1.1/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.1/_modules/doctr/io/html.html
+++ b/v0.1.1/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/base.html b/v0.1.1/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.1/_modules/doctr/io/image/base.html
+++ b/v0.1.1/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/tensorflow.html b/v0.1.1/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.1/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.1/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/io/pdf.html b/v0.1.1/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.1/_modules/doctr/io/pdf.html
+++ b/v0.1.1/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.1/_modules/doctr/io/reader.html b/v0.1.1/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.1/_modules/doctr/io/reader.html
+++ b/v0.1.1/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/zoo.html b/v0.1.1/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.1/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.1/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/zoo.html b/v0.1.1/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.1/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.1/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/factory/hub.html b/v0.1.1/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.1/_modules/doctr/models/factory/hub.html
+++ b/v0.1.1/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/zoo.html b/v0.1.1/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.1/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.1/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/zoo.html b/v0.1.1/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.1/_modules/doctr/models/zoo.html
+++ b/v0.1.1/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/base.html b/v0.1.1/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/utils/metrics.html b/v0.1.1/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.1/_modules/doctr/utils/metrics.html
+++ b/v0.1.1/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.1/_modules/doctr/utils/visualization.html b/v0.1.1/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.1/_modules/doctr/utils/visualization.html
+++ b/v0.1.1/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.1/_modules/index.html b/v0.1.1/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.1/_modules/index.html
+++ b/v0.1.1/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.1/_sources/getting_started/installing.rst.txt b/v0.1.1/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.1/_sources/getting_started/installing.rst.txt
+++ b/v0.1.1/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.1/_static/basic.css b/v0.1.1/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.1/_static/basic.css
+++ b/v0.1.1/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.1/_static/doctools.js b/v0.1.1/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.1/_static/doctools.js
+++ b/v0.1.1/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.1/_static/language_data.js b/v0.1.1/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.1/_static/language_data.js
+++ b/v0.1.1/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.1/_static/searchtools.js b/v0.1.1/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.1/_static/searchtools.js
+++ b/v0.1.1/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.1/changelog.html b/v0.1.1/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.1/changelog.html
+++ b/v0.1.1/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.1/community/resources.html b/v0.1.1/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.1/community/resources.html
+++ b/v0.1.1/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.1/contributing/code_of_conduct.html b/v0.1.1/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.1/contributing/code_of_conduct.html
+++ b/v0.1.1/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.1/contributing/contributing.html b/v0.1.1/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.1/contributing/contributing.html
+++ b/v0.1.1/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.1/genindex.html b/v0.1.1/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.1/genindex.html
+++ b/v0.1.1/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.1/getting_started/installing.html b/v0.1.1/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.1/getting_started/installing.html
+++ b/v0.1.1/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.1/index.html b/v0.1.1/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.1/index.html
+++ b/v0.1.1/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.1/modules/contrib.html b/v0.1.1/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.1/modules/contrib.html
+++ b/v0.1.1/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.1/modules/datasets.html b/v0.1.1/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.1/modules/datasets.html
+++ b/v0.1.1/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.1/modules/io.html b/v0.1.1/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.1/modules/io.html
+++ b/v0.1.1/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.1/modules/models.html b/v0.1.1/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.1/modules/models.html
+++ b/v0.1.1/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.1/modules/transforms.html b/v0.1.1/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.1/modules/transforms.html
+++ b/v0.1.1/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
Returns:¶
-
+
diff --git a/modules/models.html b/modules/models.html
index bf45d11a71..f4a9833365 100644
--- a/modules/models.html
+++ b/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1598,7 +1598,7 @@ Args:¶
-
+
diff --git a/modules/transforms.html b/modules/transforms.html
index 6d77d16e7b..bc254c867b 100644
--- a/modules/transforms.html
+++ b/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -831,7 +831,7 @@ Args:¶<
-
+
diff --git a/modules/utils.html b/modules/utils.html
index 3dd3ecbd96..6784d81f6f 100644
--- a/modules/utils.html
+++ b/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -711,7 +711,7 @@ Args:¶
-
+
diff --git a/notebooks.html b/notebooks.html
index f3ea994e49..647f73d4eb 100644
--- a/notebooks.html
+++ b/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -387,7 +387,7 @@ docTR Notebooks
-
+
diff --git a/search.html b/search.html
index f0693e2c97..0e0da5efb3 100644
--- a/search.html
+++ b/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -336,7 +336,7 @@
-
+
diff --git a/searchindex.js b/searchindex.js
index 8598997441..df18967072 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[1, "correction"]], "2. Warning": [[1, "warning"]], "3. Temporary Ban": [[1, "temporary-ban"]], "4. Permanent Ban": [[1, "permanent-ban"]], "AWS Lambda": [[13, null]], "Advanced options": [[18, "advanced-options"]], "Args:": [[6, "args"], [6, "id4"], [6, "id7"], [6, "id10"], [6, "id13"], [6, "id16"], [6, "id19"], [6, "id22"], [6, "id25"], [6, "id29"], [6, "id32"], [6, "id37"], [6, "id40"], [6, "id46"], [6, "id49"], [6, "id50"], [6, "id51"], [6, "id54"], [6, "id57"], [6, "id60"], [6, "id61"], [7, "args"], [7, "id2"], [7, "id3"], [7, "id4"], [7, "id5"], [7, "id6"], [7, "id7"], [7, "id10"], [7, "id12"], [7, "id14"], [7, "id16"], [7, "id20"], [7, "id24"], [7, "id28"], [8, "args"], [8, "id3"], [8, "id8"], [8, "id13"], [8, "id17"], [8, "id21"], [8, "id26"], [8, "id31"], [8, "id36"], [8, "id41"], [8, "id46"], [8, "id50"], [8, "id54"], [8, "id59"], [8, "id63"], [8, "id68"], [8, "id73"], [8, "id77"], [8, "id81"], [8, "id85"], [8, "id90"], [8, "id95"], [8, "id99"], [8, "id104"], [8, "id109"], [8, "id114"], [8, "id119"], [8, "id123"], [8, "id127"], [8, "id132"], [8, "id137"], [8, "id142"], [8, "id146"], [8, "id150"], [8, "id155"], [8, "id159"], [8, "id163"], [8, "id167"], [8, "id169"], [8, "id171"], [8, "id173"], [9, "args"], [9, "id1"], [9, "id2"], [9, "id3"], [9, "id4"], [9, "id5"], [9, "id6"], [9, "id7"], [9, "id8"], [9, "id9"], [9, "id10"], [9, "id11"], [9, "id12"], [9, "id13"], [9, "id14"], [9, "id15"], [9, "id16"], [9, "id17"], [9, "id18"], [9, "id19"], [10, "args"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"]], "Artefact": [[7, "artefact"]], "ArtefactDetection": [[15, "artefactdetection"]], "Attribution": [[1, "attribution"]], "Available Datasets": [[16, "available-datasets"]], "Available architectures": [[18, "available-architectures"], [18, "id1"], [18, "id2"]], "Available contribution modules": [[15, "available-contribution-modules"]], "Block": [[7, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[16, null]], "Choosing the right model": [[18, null]], "Classification": [[14, "classification"]], "Code quality": [[2, "code-quality"]], "Code style verification": [[2, "code-style-verification"]], "Codebase structure": [[2, "codebase-structure"]], "Commits": [[2, "commits"]], "Composing transformations": [[9, "composing-transformations"]], "Continuous Integration": [[2, "continuous-integration"]], "Contributing to docTR": [[2, null]], "Contributor Covenant Code of Conduct": [[1, null]], "Custom dataset loader": [[6, "custom-dataset-loader"]], "Custom orientation classification models": [[12, "custom-orientation-classification-models"]], "Data Loading": [[16, "data-loading"]], "Dataloader": [[6, "dataloader"]], "Detection": [[14, "detection"], [16, "detection"]], "Detection predictors": [[18, "detection-predictors"]], "Developer mode installation": [[2, "developer-mode-installation"]], "Developing docTR": [[2, "developing-doctr"]], "Document": [[7, "document"]], "Document structure": [[7, "document-structure"]], "End-to-End OCR": [[18, "end-to-end-ocr"]], "Enforcement": [[1, "enforcement"]], "Enforcement Guidelines": [[1, "enforcement-guidelines"]], "Enforcement Responsibilities": [[1, "enforcement-responsibilities"]], "Export to ONNX": [[17, "export-to-onnx"]], "Feature requests & bug report": [[2, "feature-requests-bug-report"]], "Feedback": [[2, "feedback"]], "File reading": [[7, "file-reading"]], "Half-precision": [[17, "half-precision"]], "Installation": [[3, null]], "Integrate contributions into your pipeline": [[15, null]], "Let\u2019s connect": [[2, "let-s-connect"]], "Line": [[7, "line"]], "Loading from Huggingface Hub": [[14, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[12, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[12, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[4, "main-features"]], "Model optimization": [[17, "model-optimization"]], "Model zoo": [[4, "model-zoo"]], "Modifying the documentation": [[2, "modifying-the-documentation"]], "Naming conventions": [[14, "naming-conventions"]], "OCR": [[16, "ocr"]], "Object Detection": [[16, "object-detection"]], "Our Pledge": [[1, "our-pledge"]], "Our Standards": [[1, "our-standards"]], "Page": [[7, "page"]], "Preparing your model for inference": [[17, null]], "Prerequisites": [[3, "prerequisites"]], "Pretrained community models": [[14, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[14, "pushing-to-the-huggingface-hub"]], "Questions": [[2, "questions"]], "Recognition": [[14, "recognition"], [16, "recognition"]], "Recognition predictors": [[18, "recognition-predictors"]], "Returns:": [[6, "returns"], [7, "returns"], [7, "id11"], [7, "id13"], [7, "id15"], [7, "id19"], [7, "id23"], [7, "id27"], [7, "id31"], [8, "returns"], [8, "id6"], [8, "id11"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id29"], [8, "id34"], [8, "id39"], [8, "id44"], [8, "id49"], [8, "id53"], [8, "id57"], [8, "id62"], [8, "id66"], [8, "id71"], [8, "id76"], [8, "id80"], [8, "id84"], [8, "id88"], [8, "id93"], [8, "id98"], [8, "id102"], [8, "id107"], [8, "id112"], [8, "id117"], [8, "id122"], [8, "id126"], [8, "id130"], [8, "id135"], [8, "id140"], [8, "id145"], [8, "id149"], [8, "id153"], [8, "id158"], [8, "id162"], [8, "id166"], [8, "id168"], [8, "id170"], [8, "id172"], [10, "returns"]], "Scope": [[1, "scope"]], "Share your model with the community": [[14, null]], "Supported Vocabs": [[6, "supported-vocabs"]], "Supported contribution modules": [[5, "supported-contribution-modules"]], "Supported datasets": [[4, "supported-datasets"]], "Supported transformations": [[9, "supported-transformations"]], "Synthetic dataset generator": [[6, "synthetic-dataset-generator"], [16, "synthetic-dataset-generator"]], "Task evaluation": [[10, "task-evaluation"]], "Text Detection": [[18, "text-detection"]], "Text Recognition": [[18, "text-recognition"]], "Text detection models": [[4, "text-detection-models"]], "Text recognition models": [[4, "text-recognition-models"]], "Train your own model": [[12, null]], "Two-stage approaches": [[18, "two-stage-approaches"]], "Unit tests": [[2, "unit-tests"]], "Use your own datasets": [[16, "use-your-own-datasets"]], "Using your ONNX exported model": [[17, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[3, "via-conda-only-for-linux"]], "Via Git": [[3, "via-git"]], "Via Python Package": [[3, "via-python-package"]], "Visualization": [[10, "visualization"]], "What should I do with the output?": [[18, "what-should-i-do-with-the-output"]], "Word": [[7, "word"]], "docTR Notebooks": [[11, null]], "docTR Vocabs": [[6, "id62"]], "docTR: Document Text Recognition": [[4, null]], "doctr.contrib": [[5, null]], "doctr.datasets": [[6, null], [6, "datasets"]], "doctr.io": [[7, null]], "doctr.models": [[8, null]], "doctr.models.classification": [[8, "doctr-models-classification"]], "doctr.models.detection": [[8, "doctr-models-detection"]], "doctr.models.factory": [[8, "doctr-models-factory"]], "doctr.models.recognition": [[8, "doctr-models-recognition"]], "doctr.models.zoo": [[8, "doctr-models-zoo"]], "doctr.transforms": [[9, null]], "doctr.utils": [[10, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[7, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[7, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[9, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[6, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[9, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[9, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[6, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[6, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[8, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[6, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[6, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[7, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[7, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[6, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[6, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[9, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[9, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[6, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[6, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[6, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[6, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[6, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[8, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[9, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[7, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[6, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[9, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[8, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[6, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[9, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[7, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[9, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[9, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[9, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[9, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[9, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[9, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[9, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[9, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[9, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[9, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[9, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[9, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[7, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[7, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[7, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[6, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[9, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[7, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[7, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[6, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[6, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[6, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[6, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[9, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[10, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[6, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[7, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[6, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[6, 0, 1, "", "CORD"], [6, 0, 1, "", "CharacterGenerator"], [6, 0, 1, "", "DetectionDataset"], [6, 0, 1, "", "DocArtefacts"], [6, 0, 1, "", "FUNSD"], [6, 0, 1, "", "IC03"], [6, 0, 1, "", "IC13"], [6, 0, 1, "", "IIIT5K"], [6, 0, 1, "", "IIITHWS"], [6, 0, 1, "", "IMGUR5K"], [6, 0, 1, "", "MJSynth"], [6, 0, 1, "", "OCRDataset"], [6, 0, 1, "", "RecognitionDataset"], [6, 0, 1, "", "SROIE"], [6, 0, 1, "", "SVHN"], [6, 0, 1, "", "SVT"], [6, 0, 1, "", "SynthText"], [6, 0, 1, "", "WILDRECEIPT"], [6, 0, 1, "", "WordGenerator"], [6, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[6, 0, 1, "", "DataLoader"]], "doctr.io": [[7, 0, 1, "", "Artefact"], [7, 0, 1, "", "Block"], [7, 0, 1, "", "Document"], [7, 0, 1, "", "DocumentFile"], [7, 0, 1, "", "Line"], [7, 0, 1, "", "Page"], [7, 0, 1, "", "Word"], [7, 1, 1, "", "decode_img_as_tensor"], [7, 1, 1, "", "read_html"], [7, 1, 1, "", "read_img_as_numpy"], [7, 1, 1, "", "read_img_as_tensor"], [7, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[7, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[7, 2, 1, "", "from_images"], [7, 2, 1, "", "from_pdf"], [7, 2, 1, "", "from_url"]], "doctr.io.Page": [[7, 2, 1, "", "show"]], "doctr.models": [[8, 1, 1, "", "kie_predictor"], [8, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[8, 1, 1, "", "crop_orientation_predictor"], [8, 1, 1, "", "magc_resnet31"], [8, 1, 1, "", "mobilenet_v3_large"], [8, 1, 1, "", "mobilenet_v3_large_r"], [8, 1, 1, "", "mobilenet_v3_small"], [8, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [8, 1, 1, "", "mobilenet_v3_small_page_orientation"], [8, 1, 1, "", "mobilenet_v3_small_r"], [8, 1, 1, "", "page_orientation_predictor"], [8, 1, 1, "", "resnet18"], [8, 1, 1, "", "resnet31"], [8, 1, 1, "", "resnet34"], [8, 1, 1, "", "resnet50"], [8, 1, 1, "", "textnet_base"], [8, 1, 1, "", "textnet_small"], [8, 1, 1, "", "textnet_tiny"], [8, 1, 1, "", "vgg16_bn_r"], [8, 1, 1, "", "vit_b"], [8, 1, 1, "", "vit_s"]], "doctr.models.detection": [[8, 1, 1, "", "db_mobilenet_v3_large"], [8, 1, 1, "", "db_resnet50"], [8, 1, 1, "", "detection_predictor"], [8, 1, 1, "", "fast_base"], [8, 1, 1, "", "fast_small"], [8, 1, 1, "", "fast_tiny"], [8, 1, 1, "", "linknet_resnet18"], [8, 1, 1, "", "linknet_resnet34"], [8, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[8, 1, 1, "", "from_hub"], [8, 1, 1, "", "login_to_hub"], [8, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[8, 1, 1, "", "crnn_mobilenet_v3_large"], [8, 1, 1, "", "crnn_mobilenet_v3_small"], [8, 1, 1, "", "crnn_vgg16_bn"], [8, 1, 1, "", "master"], [8, 1, 1, "", "parseq"], [8, 1, 1, "", "recognition_predictor"], [8, 1, 1, "", "sar_resnet31"], [8, 1, 1, "", "vitstr_base"], [8, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[9, 0, 1, "", "ChannelShuffle"], [9, 0, 1, "", "ColorInversion"], [9, 0, 1, "", "Compose"], [9, 0, 1, "", "GaussianBlur"], [9, 0, 1, "", "GaussianNoise"], [9, 0, 1, "", "LambdaTransformation"], [9, 0, 1, "", "Normalize"], [9, 0, 1, "", "OneOf"], [9, 0, 1, "", "RandomApply"], [9, 0, 1, "", "RandomBrightness"], [9, 0, 1, "", "RandomContrast"], [9, 0, 1, "", "RandomCrop"], [9, 0, 1, "", "RandomGamma"], [9, 0, 1, "", "RandomHorizontalFlip"], [9, 0, 1, "", "RandomHue"], [9, 0, 1, "", "RandomJpegQuality"], [9, 0, 1, "", "RandomResize"], [9, 0, 1, "", "RandomRotate"], [9, 0, 1, "", "RandomSaturation"], [9, 0, 1, "", "RandomShadow"], [9, 0, 1, "", "Resize"], [9, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[10, 0, 1, "", "DetectionMetric"], [10, 0, 1, "", "LocalizationConfusion"], [10, 0, 1, "", "OCRMetric"], [10, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.visualization": [[10, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [1, 7, 8, 10, 14, 17], "0": [1, 3, 6, 9, 10, 12, 15, 16, 18], "00": 18, "01": 18, "0123456789": 6, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "02562": 8, "03": 18, "035": 18, "0361328125": 18, "04": 18, "05": 18, "06": 18, "06640625": 18, "07": 18, "08": [9, 18], "09": 18, "0966796875": 18, "1": [6, 7, 8, 9, 10, 12, 16, 18], "10": [6, 10, 18], "100": [6, 9, 10, 16, 18], "1000": 18, "101": 6, "1024": [8, 12, 18], "104": 6, "106": 6, "108": 6, "1095": 16, "11": 18, "110": 10, "1107": 16, "114": 6, "115": 6, "1156": 16, "116": 6, "118": 6, "11800h": 18, "11th": 18, "12": 18, "120": 6, "123": 6, "126": 6, "1268": 16, "128": [8, 12, 17, 18], "13": 18, "130": 6, "13068": 16, "131": 6, "1337891": 16, "1357421875": 18, "1396484375": 18, "14": 18, "1420": 18, "14470v1": 6, "149": 16, "15": 18, "150": [10, 18], "1552": 18, "16": [8, 17, 18], "1630859375": 18, "1684": 18, "16x16": 8, "17": 18, "1778": 18, "1782": 18, "18": [8, 18], "185546875": 18, "1900": 18, "1910": 8, "19342": 16, "19370": 16, "195": 6, "19598": 16, "199": 18, "1999": 18, "2": [3, 4, 6, 7, 9, 15, 18], "20": 18, "200": 10, "2000": 16, "2003": [4, 6], "2012": 6, "2013": [4, 6], "2015": 6, "2019": 4, "207901": 16, "21": 18, "2103": 6, "2186": 16, "21888": 16, "22": 18, "224": [8, 9], "225": 9, "22672": 16, "229": [9, 16], "23": 18, "233": 16, "236": 6, "24": 18, "246": 16, "249": 16, "25": 18, "2504": 18, "255": [7, 8, 9, 10, 18], "256": 8, "257": 16, "26": 18, "26032": 16, "264": 12, "27": 18, "2700": 16, "2710": 18, "2749": 12, "28": 18, "287": 12, "29": 18, "296": 12, "299": 12, "2d": 18, "3": [3, 4, 7, 8, 9, 10, 17, 18], "30": 18, "300": 16, "3000": 16, "301": 12, "30595": 18, "30ghz": 18, "31": 8, "32": [6, 8, 9, 12, 16, 17, 18], "3232421875": 18, "33": [9, 18], "33402": 16, "33608": 16, "34": [8, 18], "340": 18, "3456": 18, "3515625": 18, "36": 18, "360": 16, "37": [6, 18], "38": 18, "39": 18, "4": [8, 9, 10, 18], "40": 18, "406": 9, "41": 18, "42": 18, "43": 18, "44": 18, "45": 18, "456": 9, "46": 18, "47": 18, "472": 16, "48": [6, 18], "485": 9, "49": 18, "49377": 16, "5": [6, 9, 10, 15, 18], "50": [8, 16, 18], "51": 18, "51171875": 18, "512": 8, "52": [6, 18], "529": 18, "53": 18, "54": 18, "540": 18, "5478515625": 18, "55": 18, "56": 18, "57": 18, "58": [6, 18], "580": 18, "5810546875": 18, "583": 18, "59": 18, "597": 18, "5k": [4, 6], "5m": 18, "6": [9, 18], "60": 9, "600": [8, 10, 18], "61": 18, "62": 18, "626": 16, "63": 18, "64": [8, 9, 18], "641": 18, "647": 16, "65": 18, "66": 18, "67": 18, "68": 18, "69": 18, "693": 12, "694": 12, "695": 12, "6m": 18, "7": 18, "70": [6, 10, 18], "707470": 16, "71": [6, 18], "7100000": 16, "7141797": 16, "7149": 16, "72": 18, "72dpi": 7, "73": 18, "73257": 16, "74": 18, "75": [9, 18], "7581382": 16, "76": 18, "77": 18, "772": 12, "772875": 16, "78": 18, "785": 12, "79": 18, "793533": 16, "796": 16, "798": 12, "7m": 18, "8": [8, 9, 18], "80": 18, "800": [8, 10, 16, 18], "81": 18, "82": 18, "83": 18, "84": 18, "849": 16, "85": 18, "8564453125": 18, "857": 18, "85875": 16, "86": 18, "8603515625": 18, "87": 18, "8707": 16, "88": 18, "89": 18, "9": [3, 9, 18], "90": 18, "90k": 6, "90kdict32px": 6, "91": 18, "914085328578949": 18, "92": 18, "93": 18, "94": [6, 18], "95": [10, 18], "9578408598899841": 18, "96": 18, "97": 18, "98": 18, "99": 18, "9949972033500671": 18, "A": [1, 2, 4, 6, 7, 8, 11, 17], "As": 2, "Be": 18, "Being": 1, "By": 13, "For": [1, 2, 3, 12, 18], "If": [2, 7, 8, 12, 18], "In": [2, 6, 16], "It": [9, 14, 15, 17], "Its": [4, 8], "No": [1, 18], "Of": 6, "Or": [15, 17], "The": [1, 2, 6, 7, 10, 13, 15, 16, 17, 18], "Then": 8, "To": [2, 3, 13, 14, 15, 17, 18], "_": [1, 6, 8], "__call__": 18, "_build": 2, "_i": 10, "ab": 6, "abc": 17, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "abdef": [6, 16], "abl": [16, 18], "about": [1, 16, 18], "abov": 18, "abstractdataset": 6, "abus": 1, "accept": 1, "access": [4, 7, 16, 18], "account": [1, 14], "accur": 18, "accuraci": 10, "achiev": 17, "act": 1, "action": 1, "activ": 4, "ad": [2, 8, 9], "adapt": 1, "add": [9, 10, 14, 18], "add_hook": 18, "add_label": 10, "addit": [2, 3, 7, 15, 18], "addition": [2, 18], "address": [1, 7], "adjust": 9, "advanc": 1, "advantag": 17, "advis": 2, "aesthet": [4, 6], "affect": 1, "after": [14, 18], "ag": 1, "again": 8, "aggreg": [10, 16], "aggress": 1, "align": [1, 7, 9], "all": [1, 2, 5, 6, 7, 9, 10, 15, 16, 18], "allow": [1, 17], "along": 18, "alreadi": [2, 17], "also": [1, 8, 14, 15, 16, 18], "alwai": 16, "an": [1, 2, 4, 6, 7, 8, 10, 15, 17, 18], "analysi": [7, 15], "ancient_greek": 6, "angl": [7, 9], "ani": [1, 6, 7, 8, 9, 10, 17, 18], "annot": 6, "anot": 16, "anoth": [8, 12, 16], "answer": 1, "anyascii": 10, "anyon": 4, "anyth": 15, "api": [2, 4], "apolog": 1, "apologi": 1, "app": 2, "appear": 1, "appli": [1, 6, 9], "applic": [4, 8], "appoint": 1, "appreci": 14, "appropri": [1, 2, 18], "ar": [1, 2, 3, 5, 6, 7, 9, 10, 11, 15, 16, 18], "arab": 6, "arabic_diacrit": 6, "arabic_lett": 6, "arabic_punctu": 6, "arbitrarili": [4, 8], "arch": [8, 14], "architectur": [4, 8, 14, 15], "area": 18, "argument": [6, 7, 8, 10, 12, 18], "around": 1, "arrai": [7, 9, 10], "art": [4, 15], "artefact": [10, 15, 18], "artefact_typ": 7, "artifici": [4, 6], "arxiv": [6, 8], "asarrai": 10, "ascii_lett": 6, "aspect": [4, 8, 9, 18], "assess": 10, "assign": 10, "associ": 7, "assum": 8, "assume_straight_pag": [8, 12, 18], "astyp": [8, 10, 18], "attack": 1, "attend": [4, 8], "attent": [1, 8], "autom": 4, "automat": 18, "autoregress": [4, 8], "avail": [1, 4, 5, 9], "averag": [9, 18], "avoid": [1, 3], "aw": [4, 18], "awar": 18, "azur": 18, "b": [8, 10, 18], "b_j": 10, "back": 2, "backbon": 8, "backend": 18, "background": 16, "bangla": 6, "bar": 15, "bar_cod": 16, "base": [4, 8, 15], "baselin": [4, 8, 18], "batch": [6, 8, 9, 15, 16, 18], "batch_siz": [6, 12, 15, 16, 17], "bblanchon": 3, "bbox": 18, "becaus": 13, "been": [2, 10, 16, 18], "befor": [6, 8, 9, 18], "begin": 10, "behavior": [1, 18], "being": [10, 18], "belong": 18, "benchmark": 18, "best": 1, "better": [11, 18], "between": [9, 10, 18], "bgr": 7, "bilinear": 9, "bin_thresh": 18, "binar": [4, 8, 18], "binari": [7, 17, 18], "bit": 17, "block": [10, 18], "block_1_1": 18, "blur": 9, "bmvc": 6, "bn": 14, "bodi": [1, 18], "bool": [6, 7, 8, 9, 10], "boolean": [8, 18], "both": [4, 6, 9, 16, 18], "bottom": [8, 18], "bound": [6, 7, 8, 9, 10, 15, 16, 18], "box": [6, 7, 8, 9, 10, 15, 16, 18], "box_thresh": 18, "bright": 9, "browser": [2, 4], "build": [2, 3, 17], "built": 2, "byte": [7, 18], "c": [3, 7, 10], "c_j": 10, "cach": [2, 6, 13], "cache_sampl": 6, "call": 17, "callabl": [6, 9], "can": [2, 3, 12, 13, 14, 15, 16, 18], "capabl": [2, 11, 18], "case": [6, 10], "cf": 18, "cfg": 18, "challeng": 6, "challenge2_test_task12_imag": 6, "challenge2_test_task1_gt": 6, "challenge2_training_task12_imag": 6, "challenge2_training_task1_gt": 6, "chang": [13, 18], "channel": [1, 2, 7, 9], "channel_prior": 3, "channelshuffl": 9, "charact": [4, 6, 7, 10, 16, 18], "charactergener": [6, 16], "characterist": 1, "charg": 18, "charset": 18, "chart": 7, "check": [2, 14, 18], "checkpoint": 8, "chip": 3, "ci": 2, "clarifi": 1, "clariti": 1, "class": [1, 6, 7, 9, 10, 18], "class_nam": 12, "classif": [16, 18], "classmethod": 7, "clear": 2, "clone": 3, "close": 2, "co": 14, "code": [4, 7, 15], "codecov": 2, "colab": 11, "collate_fn": 6, "collect": [7, 15], "color": 9, "colorinvers": 9, "column": 7, "com": [1, 3, 7, 8, 14], "combin": 18, "command": [2, 15], "comment": 1, "commit": 1, "common": [1, 9, 10, 17], "commun": 1, "compar": 4, "comparison": [10, 18], "competit": 6, "compil": [11, 18], "complaint": 1, "complementari": 10, "complet": 2, "compon": 18, "compos": [6, 18], "comprehens": 18, "comput": [6, 10, 17, 18], "conf_threshold": 15, "confid": [7, 18], "config": [3, 8], "configur": 8, "confus": 10, "consecut": [9, 18], "consequ": 1, "consid": [1, 2, 6, 7, 10, 18], "consist": 18, "consolid": [4, 6], "constant": 9, "construct": 1, "contact": 1, "contain": [5, 6, 11, 16, 18], "content": [6, 7, 18], "context": 8, "contib": 3, "continu": 1, "contrast": 9, "contrast_factor": 9, "contrib": [3, 15], "contribut": 1, "contributor": 2, "convers": 7, "convert": [7, 9], "convolut": 8, "coordin": [7, 18], "cord": [4, 6, 16, 18], "core": [10, 18], "corner": 18, "correct": 9, "correspond": [3, 7, 9, 18], "could": [1, 15], "counterpart": 10, "cover": 2, "coverag": 2, "cpu": [4, 12, 17], "creat": 14, "crnn": [4, 8, 14], "crnn_mobilenet_v3_larg": [8, 14, 18], "crnn_mobilenet_v3_smal": [8, 17, 18], "crnn_vgg16_bn": [8, 12, 14, 18], "crop": [7, 8, 9, 12, 16, 18], "crop_orient": [7, 18], "crop_orientation_predictor": [8, 12], "crop_param": 12, "cuda": 17, "currenc": 6, "current": [2, 12, 18], "custom": [14, 15, 17, 18], "custom_crop_orientation_model": 12, "custom_page_orientation_model": 12, "customhook": 18, "cvit": 4, "czczup": 8, "czech": 6, "d": [6, 16], "danish": 6, "data": [4, 6, 7, 9, 10, 12, 14], "dataload": 16, "dataset": [8, 12, 18], "dataset_info": 6, "date": [12, 18], "db": 14, "db_mobilenet_v3_larg": [8, 14, 18], "db_resnet34": 18, "db_resnet50": [8, 12, 14, 18], "dbnet": [4, 8], "deal": [11, 18], "decis": 1, "decod": 7, "decode_img_as_tensor": 7, "dedic": 17, "deem": 1, "deep": [8, 18], "def": 18, "default": [3, 7, 12, 13, 18], "defer": 16, "defin": [10, 17], "degre": [7, 9, 18], "degress": 7, "delet": 2, "delimit": 18, "delta": 9, "demo": [2, 4], "demonstr": 1, "depend": [2, 3, 4, 18], "deploi": 2, "deploy": 4, "derogatori": 1, "describ": 8, "descript": 11, "design": 9, "desir": 7, "det_arch": [8, 12, 14, 17], "det_b": 18, "det_model": [12, 14, 17], "det_param": 12, "det_predictor": [12, 18], "detail": [12, 18], "detect": [6, 7, 10, 11, 12, 15], "detect_languag": 8, "detect_orient": [8, 12, 18], "detection_predictor": [8, 18], "detection_task": [6, 16], "detectiondataset": [6, 16], "detectionmetr": 10, "detectionpredictor": [8, 12], "detector": [4, 8, 15], "deterior": 8, "determin": 1, "dev": [2, 13], "develop": 3, "deviat": 9, "devic": 17, "dict": [7, 10, 18], "dictionari": [7, 10], "differ": 1, "differenti": [4, 8], "digit": [4, 6, 16], "dimens": [7, 10, 18], "dimension": 9, "direct": 6, "directli": [14, 18], "directori": [2, 13], "disabl": [1, 13, 18], "disable_crop_orient": 18, "disable_page_orient": 18, "disclaim": 18, "discuss": 2, "disparag": 1, "displai": [7, 10], "display_artefact": 10, "distribut": 9, "div": 18, "divers": 1, "divid": 7, "do": [2, 3, 8], "doc": [2, 7, 15, 17, 18], "docartefact": [6, 16], "docstr": 2, "doctr": [3, 12, 13, 14, 15, 16, 17, 18], "doctr_cache_dir": 13, "doctr_multiprocessing_dis": 13, "document": [6, 8, 10, 11, 12, 15, 16, 17, 18], "documentbuild": 18, "documentfil": [7, 12, 14, 15, 17], "doesn": 17, "don": [12, 18], "done": 9, "download": [6, 16], "downsiz": 8, "draw": 9, "drop": 6, "drop_last": 6, "dtype": [7, 8, 9, 10, 17], "dual": [4, 6], "dummi": 14, "dummy_img": 18, "dummy_input": 17, "dure": 1, "dutch": 6, "dynam": [6, 15], "dynamic_seq_length": 6, "e": [1, 2, 3, 7, 8], "each": [4, 6, 7, 8, 9, 10, 16, 18], "eas": 2, "easi": [4, 10, 14, 17], "easili": [7, 10, 12, 14, 16, 18], "econom": 1, "edit": 1, "educ": 1, "effect": 18, "effici": [2, 4, 6, 8], "either": [10, 18], "element": [6, 7, 8, 18], "els": [2, 15], "email": 1, "empathi": 1, "en": 18, "enabl": [6, 7], "enclos": 7, "encod": [4, 6, 7, 8, 18], "encode_sequ": 6, "encount": 2, "encrypt": 7, "end": [4, 6, 8, 10], "english": [6, 16], "enough": [2, 18], "ensur": 2, "entri": 6, "environ": [1, 13], "eo": 6, "equiv": 18, "estim": 8, "etc": [7, 15], "ethnic": 1, "evalu": [16, 18], "event": 1, "everyon": 1, "everyth": [2, 18], "exact": [10, 18], "exampl": [1, 2, 4, 6, 8, 14, 18], "exchang": 17, "execut": 18, "exist": 14, "expand": 9, "expect": [7, 9, 10], "experi": 1, "explan": [1, 18], "explicit": 1, "exploit": [4, 8], "export": [7, 8, 10, 11, 15, 18], "export_as_straight_box": [8, 18], "export_as_xml": 18, "export_model_to_onnx": 17, "express": [1, 9], "extens": 7, "extern": [1, 16], "extract": [4, 6], "extractor": 8, "f_": 10, "f_a": 10, "factor": 9, "fair": 1, "fairli": 1, "fals": [6, 7, 8, 9, 10, 12, 18], "faq": 1, "fascan": 14, "fast": [4, 6, 8], "fast_bas": [8, 18], "fast_smal": [8, 18], "fast_tini": [8, 18], "faster": [4, 8, 17], "fasterrcnn_mobilenet_v3_large_fpn": 8, "favorit": 18, "featur": [3, 8, 10, 11, 12, 15], "feedback": 1, "feel": [2, 14], "felix92": 14, "few": [17, 18], "figsiz": 10, "figur": [10, 15], "file": [2, 6], "final": 8, "find": [2, 16], "finnish": 6, "first": [2, 6], "firsthand": 6, "fit": [8, 18], "flag": 18, "flip": 9, "float": [7, 9, 10, 17], "float32": [7, 8, 9, 17], "fn": 9, "focu": 14, "focus": [1, 6], "folder": 6, "follow": [1, 2, 3, 6, 9, 10, 12, 13, 14, 15, 18], "font": 6, "font_famili": 6, "foral": 10, "forc": 2, "forg": 3, "form": [4, 6, 18], "format": [7, 10, 12, 16, 17, 18], "forpost": [4, 6], "forum": 2, "fp16": 17, "frac": 10, "framework": [3, 14, 16, 18], "free": [1, 2, 14], "french": [6, 12, 14, 18], "friendli": 4, "from": [1, 4, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18], "from_hub": [8, 14], "from_imag": [7, 14, 15, 17], "from_pdf": 7, "from_url": 7, "full": [6, 10, 18], "function": [6, 9, 10, 15], "funsd": [4, 6, 16, 18], "further": 16, "futur": 6, "g": [7, 8], "g_": 10, "g_x": 10, "gamma": 9, "gaussian": 9, "gaussianblur": 9, "gaussiannois": 9, "gen": 18, "gender": 1, "gener": [2, 4, 7, 8], "generic_cyrillic_lett": 6, "geometri": [4, 7, 18], "geq": 10, "german": [6, 12, 14], "get": [17, 18], "git": 14, "github": [2, 3, 8, 14], "give": [1, 15], "given": [6, 7, 9, 10, 18], "global": 8, "go": 18, "good": 17, "googl": 2, "googlevis": 4, "gpu": [4, 15, 17], "gracefulli": 1, "graph": [4, 6, 7], "grayscal": 9, "ground": 10, "groung": 10, "group": [4, 18], "gt": 10, "gt_box": 10, "gt_label": 10, "guid": 2, "guidanc": 16, "gvision": 18, "h": [7, 8, 9], "h_": 10, "ha": [2, 6, 10, 16], "handl": [11, 16, 18], "handwrit": 6, "handwritten": 16, "harass": 1, "hardwar": 18, "harm": 1, "hat": 10, "have": [1, 2, 10, 12, 14, 16, 17, 18], "head": [8, 18], "healthi": 1, "hebrew": 6, "height": [7, 9], "hello": [10, 18], "help": 17, "here": [5, 9, 11, 15, 16, 18], "hf": 8, "hf_hub_download": 8, "high": 7, "higher": [3, 6, 18], "hindi": 6, "hindi_digit": 6, "hocr": 18, "hook": 18, "horizont": [7, 9, 18], "hous": 6, "how": [2, 11, 12, 14, 16], "howev": 16, "hsv": 9, "html": [1, 2, 3, 7, 18], "http": [1, 3, 6, 7, 8, 14, 18], "hub": 8, "hue": 9, "huggingfac": 8, "hw": 6, "i": [1, 2, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17], "i7": 18, "ic03": [4, 6, 16], "ic13": [4, 6, 16], "icdar": [4, 6], "icdar2019": 6, "id": 18, "ident": 1, "identifi": 4, "iiit": [4, 6], "iiit5k": [6, 16], "iiithw": [4, 6, 16], "imag": [4, 6, 7, 8, 9, 10, 14, 15, 16, 18], "imagenet": 8, "imageri": 1, "images_90k_norm": 6, "img": [6, 9, 16, 17], "img_cont": 7, "img_fold": [6, 16], "img_path": 7, "img_transform": 6, "imgur5k": [4, 6, 16], "imgur5k_annot": 6, "imlist": 6, "impact": 1, "implement": [6, 7, 8, 9, 10, 18], "import": [6, 7, 8, 9, 10, 12, 14, 15, 16, 17, 18], "improv": 8, "inappropri": 1, "incid": 1, "includ": [1, 6, 16, 17], "inclus": 1, "increas": 9, "independ": 9, "index": [2, 7], "indic": 10, "individu": 1, "infer": [4, 8, 9, 15, 18], "inform": [1, 2, 4, 6, 16], "input": [2, 7, 8, 9, 17, 18], "input_crop": 8, "input_pag": [8, 10, 18], "input_shap": 17, "input_tensor": 8, "inspir": [1, 9], "instal": [14, 15, 17], "instanc": [1, 18], "instanti": [8, 18], "instead": [6, 7, 8], "insult": 1, "int": [6, 7, 9], "int64": 10, "integ": 10, "integr": [4, 14, 16], "intel": 18, "interact": [1, 7, 10], "interfac": [14, 17], "interoper": 17, "interpol": 9, "interpret": [6, 7], "intersect": 10, "invert": 9, "investig": 1, "invis": 1, "involv": [1, 18], "io": [12, 14, 15, 17], "iou": 10, "iou_thresh": 10, "iou_threshold": 15, "irregular": [4, 8, 16], "isn": 6, "issu": [1, 2, 14], "italian": 6, "iter": [6, 9, 16, 18], "its": [7, 8, 9, 10, 16, 18], "itself": [8, 14], "j": 10, "job": 2, "join": 2, "jpeg": 9, "jpegqual": 9, "jpg": [6, 7, 14, 17], "json": [6, 16, 18], "json_output": 18, "jump": 2, "just": 1, "kei": [4, 6], "kera": [8, 17], "kernel": [4, 8, 9], "kernel_shap": 9, "keywoard": 8, "keyword": [6, 7, 8, 10], "kie": [8, 12], "kie_predictor": [8, 12], "kiepredictor": 8, "kind": 1, "know": [2, 17], "kwarg": [6, 7, 8, 10], "l": 10, "l_j": 10, "label": [6, 10, 15, 16], "label_fil": [6, 16], "label_fold": 6, "label_path": [6, 16], "labels_path": [6, 16], "ladder": 1, "lambda": 9, "lambdatransform": 9, "lang": 18, "languag": [1, 4, 6, 7, 8, 14, 18], "larg": [8, 14], "largest": 10, "last": [3, 6], "latenc": 8, "later": 2, "latest": 18, "latin": 6, "layer": 17, "layout": 18, "lead": 1, "leader": 1, "learn": [1, 4, 8, 17, 18], "least": 3, "left": [10, 18], "legacy_french": 6, "length": [6, 18], "less": [17, 18], "level": [1, 6, 10, 18], "leverag": 11, "lf": 14, "librari": [2, 3, 11, 12], "light": 4, "lightweight": 17, "like": 1, "limits_": 10, "line": [4, 8, 10, 18], "line_1_1": 18, "link": 12, "linknet": [4, 8], "linknet_resnet18": [8, 12, 17, 18], "linknet_resnet34": [8, 17, 18], "linknet_resnet50": [8, 18], "list": [6, 7, 9, 10, 14], "ll": 10, "load": [4, 6, 8, 15, 17], "load_state_dict": 12, "load_weight": 12, "loc_pr": 18, "local": [2, 4, 6, 8, 10, 16, 18], "localis": 6, "localizationconfus": 10, "locat": [2, 7, 18], "login": 8, "login_to_hub": [8, 14], "logo": [7, 15, 16], "love": 14, "lower": [9, 10, 18], "m": [2, 10, 18], "m1": 3, "macbook": 3, "machin": 17, "made": 4, "magc_resnet31": 8, "mai": [1, 2], "mail": 1, "main": 11, "maintain": 4, "mainten": 2, "make": [1, 2, 10, 12, 13, 14, 17, 18], "mani": [16, 18], "manipul": 18, "map": [6, 8], "map_loc": 12, "master": [4, 8, 18], "match": [10, 18], "mathcal": 10, "matplotlib": [7, 10], "max": [6, 9, 10], "max_angl": 9, "max_area": 9, "max_char": [6, 16], "max_delta": 9, "max_gain": 9, "max_gamma": 9, "max_qual": 9, "max_ratio": 9, "maximum": [6, 9], "maxval": [8, 9], "mbox": 10, "mean": [9, 10, 12], "meaniou": 10, "meant": [7, 17], "measur": 18, "media": 1, "median": 8, "meet": 12, "member": 1, "memori": [13, 17], "mention": 18, "merg": 6, "messag": 2, "meta": 18, "metadata": 17, "metal": 3, "method": [7, 9, 18], "metric": [10, 18], "middl": 18, "might": [17, 18], "min": 9, "min_area": 9, "min_char": [6, 16], "min_gain": 9, "min_gamma": 9, "min_qual": 9, "min_ratio": 9, "min_val": 9, "minde": [1, 3, 4, 8], "minim": [2, 4], "minimalist": [4, 8], "minimum": [3, 6, 9, 10, 18], "minval": 9, "miss": 3, "mistak": 1, "mixed_float16": 17, "mixed_precis": 17, "mjsynth": [4, 6, 16], "mnt": 6, "mobilenet": [8, 14], "mobilenet_v3_larg": 8, "mobilenet_v3_large_r": 8, "mobilenet_v3_smal": [8, 12], "mobilenet_v3_small_crop_orient": [8, 12], "mobilenet_v3_small_page_orient": [8, 12], "mobilenet_v3_small_r": 8, "mobilenetv3": 8, "modal": [4, 6], "mode": 3, "model": [6, 10, 13, 15, 16], "model_nam": [8, 14, 17], "model_path": [15, 17], "moder": 1, "modif": 2, "modifi": [8, 13, 18], "modul": [3, 7, 8, 9, 10, 18], "more": [2, 16, 18], "most": 18, "mozilla": 1, "multi": [4, 8], "multilingu": [6, 14], "multipl": [6, 7, 9, 18], "multipli": 9, "multiprocess": 13, "my": 8, "my_awesome_model": 14, "my_hook": 18, "n": [6, 10], "name": [6, 8, 17, 18], "nation": 1, "natur": [1, 4, 6], "ndarrai": [6, 7, 9, 10], "necessari": [3, 12, 13], "need": [2, 3, 6, 10, 12, 13, 14, 15, 18], "neg": 9, "nest": 18, "network": [4, 6, 8, 17], "neural": [4, 6, 8, 17], "new": [2, 10], "next": [6, 16], "nois": 9, "noisi": [4, 6], "non": [4, 6, 7, 8, 9, 10], "none": [6, 7, 8, 9, 10, 18], "normal": [8, 9], "norwegian": 6, "note": [0, 2, 6, 8, 12, 14, 15, 17], "now": 2, "np": [8, 9, 10, 18], "num_output_channel": 9, "num_sampl": [6, 16], "number": [6, 9, 10, 18], "numpi": [7, 8, 10, 18], "o": 3, "obb": 15, "obj_detect": 14, "object": [6, 7, 10, 15, 18], "objectness_scor": [7, 18], "oblig": 1, "obtain": 18, "occupi": 17, "ocr": [4, 6, 8, 10, 14], "ocr_carea": 18, "ocr_db_crnn": 10, "ocr_lin": 18, "ocr_pag": 18, "ocr_par": 18, "ocr_predictor": [8, 12, 14, 17, 18], "ocrdataset": [6, 16], "ocrmetr": 10, "ocrpredictor": [8, 12], "ocrx_word": 18, "offens": 1, "offici": [1, 8], "offlin": 1, "offset": 9, "onc": 18, "one": [2, 6, 8, 9, 12, 14, 18], "oneof": 9, "ones": [6, 10], "onli": [2, 8, 9, 10, 12, 14, 16, 17, 18], "onlin": 1, "onnx": 15, "onnxruntim": [15, 17], "onnxtr": 17, "opac": 9, "opacity_rang": 9, "open": [1, 2, 14, 17], "opinion": 1, "optic": [4, 18], "optim": [4, 18], "option": [6, 8, 12], "order": [2, 6, 7, 9], "org": [1, 6, 8, 18], "organ": 7, "orient": [1, 7, 8, 11, 15, 18], "orientationpredictor": 8, "other": [1, 2], "otherwis": [1, 7, 10], "our": [2, 8, 18], "out": [2, 8, 9, 10, 18], "outpout": 18, "output": [7, 9, 17], "output_s": [7, 9], "outsid": 13, "over": [6, 10, 18], "overal": [1, 8], "overlai": 7, "overview": 15, "overwrit": 12, "overwritten": 14, "own": 4, "p": [9, 18], "packag": [2, 4, 10, 13, 15, 16, 17], "pad": [6, 8, 9, 18], "page": [3, 6, 8, 10, 12, 18], "page1": 7, "page2": 7, "page_1": 18, "page_idx": [7, 18], "page_orientation_predictor": [8, 12], "page_param": 12, "pair": 10, "paper": 8, "par_1_1": 18, "paragraph": 18, "paragraph_break": 18, "param": [9, 18], "paramet": [4, 7, 8, 17], "pars": [4, 6], "parseq": [4, 8, 14, 17, 18], "part": [6, 9, 18], "parti": 3, "partial": 18, "particip": 1, "pass": [6, 7, 8, 12, 18], "password": 7, "patch": [8, 10], "path": [6, 7, 15, 16, 17], "path_to_checkpoint": 12, "path_to_custom_model": 17, "path_to_pt": 12, "pattern": 1, "pdf": [7, 8, 11], "pdfpage": 7, "peopl": 1, "per": [9, 18], "perform": [4, 7, 8, 9, 10, 13, 17, 18], "period": 1, "permiss": 1, "permut": [4, 8], "persian_lett": 6, "person": [1, 16], "phase": 18, "photo": 16, "physic": [1, 7], "pick": 9, "pictur": 7, "pip": [2, 3, 15, 17], "pipelin": 18, "pixel": [7, 9, 18], "pleas": 2, "plot": 10, "plt": 10, "plug": 14, "plugin": 3, "png": 7, "point": 17, "polici": 13, "polish": 6, "polit": 1, "polygon": [6, 10, 18], "pool": 8, "portugues": 6, "posit": [1, 10], "possibl": [2, 10, 14, 18], "post": [1, 18], "postprocessor": 18, "potenti": 8, "power": 4, "ppageno": 18, "pre": [2, 8, 17], "precis": [10, 18], "pred": 10, "pred_box": 10, "pred_label": 10, "predefin": 16, "predict": [7, 8, 10, 18], "predictor": [4, 7, 8, 11, 12, 14, 17], "prefer": 16, "preinstal": 3, "preprocessor": [12, 18], "prerequisit": 14, "present": 11, "preserv": [8, 9, 18], "preserve_aspect_ratio": [7, 8, 9, 12, 18], "pretrain": [4, 8, 10, 12, 17, 18], "pretrained_backbon": [8, 12], "print": 18, "prior": 6, "privaci": 1, "privat": 1, "probabl": 9, "problem": 2, "procedur": 9, "process": [2, 4, 7, 12, 18], "processor": 18, "produc": [11, 18], "product": 17, "profession": 1, "project": [2, 16], "promptli": 1, "proper": 2, "properli": 6, "provid": [1, 2, 4, 14, 15, 16, 18], "public": [1, 4], "publicli": 18, "publish": 1, "pull": 14, "punctuat": 6, "pure": 6, "purpos": 2, "push_to_hf_hub": [8, 14], "py": 14, "pypdfium2": [3, 7], "pyplot": [7, 10], "python": [2, 15], "python3": 14, "pytorch": [3, 4, 8, 9, 12, 14, 17, 18], "q": 2, "qr": [7, 15], "qr_code": 16, "qualiti": 9, "question": 1, "quickli": 4, "quicktour": 11, "r": 18, "race": 1, "ramdisk": 6, "rand": [8, 9, 10, 17, 18], "random": [8, 9, 10, 18], "randomappli": 9, "randombright": 9, "randomcontrast": 9, "randomcrop": 9, "randomgamma": 9, "randomhorizontalflip": 9, "randomhu": 9, "randomjpegqu": 9, "randomli": 9, "randomres": 9, "randomrot": 9, "randomsatur": 9, "randomshadow": 9, "rang": 9, "rassi": 14, "ratio": [8, 9, 18], "raw": [7, 10], "re": 17, "read": [4, 6, 8], "read_html": 7, "read_img_as_numpi": 7, "read_img_as_tensor": 7, "read_pdf": 7, "readi": 17, "real": [4, 8, 9], "reason": [1, 4, 6], "rebuild": 2, "rebuilt": 2, "recal": [10, 18], "receipt": [4, 6, 18], "reco_arch": [8, 12, 14, 17], "reco_b": 18, "reco_model": [12, 14, 17], "reco_param": 12, "reco_predictor": 12, "recogn": 18, "recognit": [6, 10, 11, 12], "recognition_predictor": [8, 18], "recognition_task": [6, 16], "recognitiondataset": [6, 16], "recognitionpredictor": [8, 12], "rectangular": 8, "reduc": [3, 9], "refer": [2, 3, 12, 14, 15, 16, 18], "regardless": 1, "region": 18, "regroup": 10, "regular": 16, "reject": 1, "rel": [7, 9, 10, 18], "relat": 7, "releas": [0, 3], "relev": 15, "religion": 1, "remov": 1, "render": [7, 18], "repo": 8, "repo_id": [8, 14], "report": 1, "repositori": [6, 8, 14], "repres": [1, 17, 18], "represent": [4, 8], "request": [1, 14], "requir": [3, 9, 17], "research": 4, "residu": 8, "resiz": [9, 18], "resnet": 8, "resnet18": [8, 14], "resnet31": 8, "resnet34": 8, "resnet50": [8, 14], "resolv": 7, "resolve_block": 18, "resolve_lin": 18, "resourc": 16, "respect": 1, "rest": [2, 9, 10], "restrict": 13, "result": [2, 6, 7, 11, 14, 17, 18], "return": 18, "reusabl": 18, "review": 1, "rgb": [7, 9], "rgb_mode": 7, "rgb_output": 7, "right": [1, 8, 10], "robust": [4, 6], "root": 6, "rotat": [6, 7, 8, 9, 10, 11, 12, 16, 18], "run": [2, 3, 8], "same": [2, 7, 10, 16, 17, 18], "sampl": [6, 16, 18], "sample_transform": 6, "sar": [4, 8], "sar_resnet31": [8, 18], "satur": 9, "save": [8, 16], "scale": [7, 8, 9, 10], "scale_rang": 9, "scan": [4, 6], "scene": [4, 6, 8], "score": [7, 10], "script": [2, 16], "seamless": 4, "seamlessli": [4, 18], "search": 8, "searchabl": 11, "sec": 18, "second": 18, "section": [12, 14, 15, 17, 18], "secur": [1, 13], "see": [1, 2], "seen": 18, "segment": [4, 8, 18], "self": 18, "semant": [4, 8], "send": 18, "sens": 10, "sensit": 16, "separ": 18, "sequenc": [4, 6, 7, 8, 10, 18], "sequenti": [9, 18], "seri": 1, "seriou": 1, "set": [1, 3, 6, 8, 10, 13, 15, 18], "set_global_polici": 17, "sever": [7, 9, 18], "sex": 1, "sexual": 1, "shade": 9, "shape": [4, 7, 8, 9, 10, 18], "share": [13, 16], "shift": 9, "shm": 13, "should": [2, 6, 7, 9, 10], "show": [4, 7, 8, 10, 12, 14, 15], "showcas": [2, 11], "shuffl": [6, 9], "side": 10, "signatur": 7, "signific": 16, "simpl": [4, 8, 17], "simpler": 8, "sinc": [6, 16], "singl": [1, 2, 4, 6], "single_img_doc": 17, "size": [1, 6, 7, 9, 15, 18], "skew": 18, "slack": 2, "slightli": 8, "small": [2, 8, 18], "smallest": 7, "snapshot_download": 8, "snippet": 18, "so": [2, 3, 6, 8, 14, 16], "social": 1, "socio": 1, "some": [3, 11, 14, 16], "someth": 2, "somewher": 2, "sort": 1, "sourc": [6, 7, 8, 9, 10, 14], "space": [1, 18], "span": 18, "spanish": 6, "spatial": [4, 6, 7], "specif": [2, 3, 10, 12, 16, 18], "specifi": [1, 6, 7], "speed": [4, 8, 18], "sphinx": 2, "sroie": [4, 6, 16], "stabl": 3, "stackoverflow": 2, "stage": 4, "standalon": 11, "standard": 9, "start": 6, "state": [4, 10, 15], "static": 10, "statu": 1, "std": [9, 12], "step": 13, "still": 18, "str": [6, 7, 8, 9, 10], "straight": [6, 8, 16, 18], "straighten": 18, "straighten_pag": [8, 12, 18], "straigten_pag": 12, "stream": 7, "street": [4, 6], "strict": 3, "strictli": 10, "string": [6, 7, 10, 18], "strive": 3, "strong": [4, 8], "structur": [17, 18], "subset": [6, 18], "suggest": [2, 14], "sum": 10, "summari": 10, "support": [3, 12, 15, 17, 18], "sustain": 1, "svhn": [4, 6, 16], "svt": [6, 16], "swedish": 6, "symmetr": [8, 9, 18], "symmetric_pad": [8, 9, 18], "synthet": 4, "synthtext": [4, 6, 16], "system": 18, "t": [2, 6, 12, 17, 18], "tabl": [14, 15, 16], "take": [1, 6, 18], "target": [6, 7, 9, 10, 16], "target_s": 6, "task": [4, 6, 8, 14, 16, 18], "task2": 6, "team": 3, "techminde": 3, "templat": [2, 4], "tensor": [6, 7, 9, 18], "tensorflow": [3, 4, 7, 8, 9, 12, 14, 17, 18], "tensorspec": 17, "term": 1, "test": [6, 16], "test_set": 6, "text": [6, 7, 8, 10, 16], "text_output": 18, "textmatch": 10, "textnet": 8, "textnet_bas": 8, "textnet_smal": 8, "textnet_tini": 8, "textract": [4, 18], "textstylebrush": [4, 6], "textual": [4, 6, 7, 8, 18], "tf": [3, 7, 8, 9, 14, 17], "than": [2, 10, 14], "thank": 2, "thei": [1, 10], "them": [6, 18], "thi": [1, 2, 3, 5, 6, 9, 10, 12, 13, 14, 16, 17, 18], "thing": [17, 18], "third": 3, "those": [1, 7, 18], "threaten": 1, "threshold": 18, "through": [1, 9, 15, 16], "tilman": 14, "time": [1, 4, 8, 10, 16], "tini": 8, "titl": [7, 18], "tm": 18, "tmp": 13, "togeth": [2, 7], "tograi": 9, "tool": 16, "top": [10, 17, 18], "topic": 2, "torch": [3, 9, 12, 14, 17], "torchvis": 9, "total": 12, "toward": [1, 3], "train": [2, 6, 8, 9, 14, 15, 16, 17, 18], "train_it": [6, 16], "train_load": [6, 16], "train_pytorch": 14, "train_set": [6, 16], "train_tensorflow": 14, "trainabl": [4, 8], "tranform": 9, "transcrib": 18, "transfer": [4, 6], "transfo": 9, "transform": [4, 6, 8], "translat": 1, "troll": 1, "true": [6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18], "truth": 10, "tune": 17, "tupl": [6, 7, 9, 10], "two": [7, 13], "txt": 6, "type": [7, 10, 14, 17, 18], "typic": 18, "u": [1, 2], "ucsd": 6, "udac": 2, "uint8": [7, 8, 10, 18], "ukrainian": 6, "unaccept": 1, "underli": [16, 18], "underneath": 7, "understand": [4, 6, 18], "uniform": [8, 9], "uniformli": 9, "uninterrupt": [7, 18], "union": 10, "unittest": 2, "unlock": 7, "unoffici": 8, "unprofession": 1, "unsolicit": 1, "unsupervis": 4, "unwelcom": 1, "up": [8, 18], "updat": 10, "upgrad": 2, "upper": [6, 9], "uppercas": 16, "url": 7, "us": [1, 2, 3, 6, 8, 10, 11, 12, 13, 14, 15, 18], "usabl": 18, "usag": [13, 17], "use_polygon": [6, 10, 16], "useabl": 18, "user": [4, 7, 11], "utf": 18, "util": 17, "v1": 14, "v3": [8, 14, 18], "valid": 16, "valu": [2, 7, 9, 18], "valuabl": 4, "variabl": 13, "varieti": 6, "veri": 8, "version": [1, 2, 3, 17, 18], "vgg": 8, "vgg16": 14, "vgg16_bn_r": 8, "via": 1, "vietnames": 6, "view": [4, 6], "viewpoint": 1, "violat": 1, "visibl": 1, "vision": [4, 6, 8], "visiondataset": 6, "visiontransform": 8, "visual": [3, 4, 15], "visualize_pag": 10, "vit_": 8, "vit_b": 8, "vitstr": [4, 8, 17], "vitstr_bas": [8, 18], "vitstr_smal": [8, 12, 17, 18], "viz": 3, "vocab": [12, 14, 16, 17, 18], "vocabulari": [6, 12, 14], "w": [7, 8, 9, 10], "w3": 18, "wa": 1, "wai": [1, 4, 16], "want": [2, 17, 18], "warmup": 18, "wasn": 2, "we": [1, 2, 3, 4, 7, 9, 12, 14, 16, 17, 18], "weasyprint": 7, "web": [2, 7], "websit": 6, "welcom": 1, "well": [1, 17], "were": [1, 7, 18], "what": 1, "when": [1, 2, 8], "whenev": 2, "where": [2, 7, 9, 10], "whether": [2, 6, 7, 9, 10, 16, 18], "which": [1, 8, 13, 15, 16, 18], "whichev": 3, "while": [9, 18], "why": 1, "width": [7, 9], "wiki": 1, "wildreceipt": [4, 6, 16], "window": [8, 10], "wish": 2, "within": 1, "without": [1, 6, 8], "wonder": 2, "word": [4, 6, 8, 10, 18], "word_1_1": 18, "word_1_2": 18, "word_1_3": 18, "wordgener": [6, 16], "words_onli": 10, "work": [12, 13, 18], "workflow": 2, "worklow": 2, "world": [10, 18], "worth": 8, "wrap": 18, "wrapper": [6, 9], "write": 13, "written": [1, 7], "www": [1, 7, 18], "x": [7, 9, 10], "x_ascend": 18, "x_descend": 18, "x_i": 10, "x_size": 18, "x_wconf": 18, "xhtml": 18, "xmax": 7, "xmin": 7, "xml": 18, "xml_bytes_str": 18, "xml_element": 18, "xml_output": 18, "xmln": 18, "y": 10, "y_i": 10, "y_j": 10, "yet": 15, "ymax": 7, "ymin": 7, "yolov8": 15, "you": [2, 3, 6, 7, 8, 12, 13, 14, 15, 16, 17, 18], "your": [2, 4, 7, 10, 18], "yoursit": 7, "zero": [9, 10], "zoo": 12, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 6, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 6, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 6, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 6, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 6, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 6, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 6, "\u00e4\u00f6\u00e4\u00f6": 6, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 6, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 6, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 6, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 6, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 6, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 6, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0905": 6, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 6, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 6, "\u0950": 6, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 6, "\u09bd": 6, "\u09ce": 6, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 6}, "titles": ["Changelog", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 2, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 1], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 1], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 1], "31": 0, "4": [0, 1], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 18, "approach": 18, "architectur": 18, "arg": [6, 7, 8, 9, 10], "artefact": 7, "artefactdetect": 15, "attribut": 1, "avail": [15, 16, 18], "aw": 13, "ban": 1, "block": 7, "bug": 2, "changelog": 0, "choos": [16, 18], "classif": [8, 12, 14], "code": [1, 2], "codebas": 2, "commit": 2, "commun": 14, "compos": 9, "conda": 3, "conduct": 1, "connect": 2, "continu": 2, "contrib": 5, "contribut": [2, 5, 15], "contributor": 1, "convent": 14, "correct": 1, "coven": 1, "custom": [6, 12], "data": 16, "dataload": 6, "dataset": [4, 6, 16], "detect": [4, 8, 14, 16, 18], "develop": 2, "do": 18, "doctr": [2, 4, 5, 6, 7, 8, 9, 10, 11], "document": [2, 4, 7], "end": 18, "enforc": 1, "evalu": 10, "export": 17, "factori": 8, "featur": [2, 4], "feedback": 2, "file": 7, "from": 14, "gener": [6, 16], "git": 3, "guidelin": 1, "half": 17, "hub": 14, "huggingfac": 14, "i": 18, "infer": 17, "instal": [2, 3], "integr": [2, 15], "io": 7, "lambda": 13, "let": 2, "line": 7, "linux": 3, "load": [12, 14, 16], "loader": 6, "main": 4, "mode": 2, "model": [4, 8, 12, 14, 17, 18], "modifi": 2, "modul": [5, 15], "name": 14, "notebook": 11, "object": 16, "ocr": [16, 18], "onli": 3, "onnx": 17, "optim": 17, "option": 18, "orient": 12, "our": 1, "output": 18, "own": [12, 16], "packag": 3, "page": 7, "perman": 1, "pipelin": 15, "pledg": 1, "precis": 17, "predictor": 18, "prepar": 17, "prerequisit": 3, "pretrain": 14, "push": 14, "python": 3, "qualiti": 2, "question": 2, "read": 7, "readi": 16, "recognit": [4, 8, 14, 16, 18], "report": 2, "request": 2, "respons": 1, "return": [6, 7, 8, 10], "right": 18, "scope": 1, "share": 14, "should": 18, "stage": 18, "standard": 1, "structur": [2, 7], "style": 2, "support": [4, 5, 6, 9], "synthet": [6, 16], "task": 10, "temporari": 1, "test": 2, "text": [4, 18], "train": 12, "transform": 9, "two": 18, "unit": 2, "us": [16, 17], "util": 10, "v0": 0, "verif": 2, "via": 3, "visual": 10, "vocab": 6, "warn": 1, "what": 18, "word": 7, "your": [12, 14, 15, 16, 17], "zoo": [4, 8]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[1, "correction"]], "2. Warning": [[1, "warning"]], "3. Temporary Ban": [[1, "temporary-ban"]], "4. Permanent Ban": [[1, "permanent-ban"]], "AWS Lambda": [[13, null]], "Advanced options": [[18, "advanced-options"]], "Args:": [[6, "args"], [6, "id4"], [6, "id7"], [6, "id10"], [6, "id13"], [6, "id16"], [6, "id19"], [6, "id22"], [6, "id25"], [6, "id29"], [6, "id32"], [6, "id37"], [6, "id40"], [6, "id46"], [6, "id49"], [6, "id50"], [6, "id51"], [6, "id54"], [6, "id57"], [6, "id60"], [6, "id61"], [7, "args"], [7, "id2"], [7, "id3"], [7, "id4"], [7, "id5"], [7, "id6"], [7, "id7"], [7, "id10"], [7, "id12"], [7, "id14"], [7, "id16"], [7, "id20"], [7, "id24"], [7, "id28"], [8, "args"], [8, "id3"], [8, "id8"], [8, "id13"], [8, "id17"], [8, "id21"], [8, "id26"], [8, "id31"], [8, "id36"], [8, "id41"], [8, "id46"], [8, "id50"], [8, "id54"], [8, "id59"], [8, "id63"], [8, "id68"], [8, "id73"], [8, "id77"], [8, "id81"], [8, "id85"], [8, "id90"], [8, "id95"], [8, "id99"], [8, "id104"], [8, "id109"], [8, "id114"], [8, "id119"], [8, "id123"], [8, "id127"], [8, "id132"], [8, "id137"], [8, "id142"], [8, "id146"], [8, "id150"], [8, "id155"], [8, "id159"], [8, "id163"], [8, "id167"], [8, "id169"], [8, "id171"], [8, "id173"], [9, "args"], [9, "id1"], [9, "id2"], [9, "id3"], [9, "id4"], [9, "id5"], [9, "id6"], [9, "id7"], [9, "id8"], [9, "id9"], [9, "id10"], [9, "id11"], [9, "id12"], [9, "id13"], [9, "id14"], [9, "id15"], [9, "id16"], [9, "id17"], [9, "id18"], [9, "id19"], [10, "args"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"]], "Artefact": [[7, "artefact"]], "ArtefactDetection": [[15, "artefactdetection"]], "Attribution": [[1, "attribution"]], "Available Datasets": [[16, "available-datasets"]], "Available architectures": [[18, "available-architectures"], [18, "id1"], [18, "id2"]], "Available contribution modules": [[15, "available-contribution-modules"]], "Block": [[7, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[16, null]], "Choosing the right model": [[18, null]], "Classification": [[14, "classification"]], "Code quality": [[2, "code-quality"]], "Code style verification": [[2, "code-style-verification"]], "Codebase structure": [[2, "codebase-structure"]], "Commits": [[2, "commits"]], "Composing transformations": [[9, "composing-transformations"]], "Continuous Integration": [[2, "continuous-integration"]], "Contributing to docTR": [[2, null]], "Contributor Covenant Code of Conduct": [[1, null]], "Custom dataset loader": [[6, "custom-dataset-loader"]], "Custom orientation classification models": [[12, "custom-orientation-classification-models"]], "Data Loading": [[16, "data-loading"]], "Dataloader": [[6, "dataloader"]], "Detection": [[14, "detection"], [16, "detection"]], "Detection predictors": [[18, "detection-predictors"]], "Developer mode installation": [[2, "developer-mode-installation"]], "Developing docTR": [[2, "developing-doctr"]], "Document": [[7, "document"]], "Document structure": [[7, "document-structure"]], "End-to-End OCR": [[18, "end-to-end-ocr"]], "Enforcement": [[1, "enforcement"]], "Enforcement Guidelines": [[1, "enforcement-guidelines"]], "Enforcement Responsibilities": [[1, "enforcement-responsibilities"]], "Export to ONNX": [[17, "export-to-onnx"]], "Feature requests & bug report": [[2, "feature-requests-bug-report"]], "Feedback": [[2, "feedback"]], "File reading": [[7, "file-reading"]], "Half-precision": [[17, "half-precision"]], "Installation": [[3, null]], "Integrate contributions into your pipeline": [[15, null]], "Let\u2019s connect": [[2, "let-s-connect"]], "Line": [[7, "line"]], "Loading from Huggingface Hub": [[14, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[12, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[12, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[4, "main-features"]], "Model optimization": [[17, "model-optimization"]], "Model zoo": [[4, "model-zoo"]], "Modifying the documentation": [[2, "modifying-the-documentation"]], "Naming conventions": [[14, "naming-conventions"]], "OCR": [[16, "ocr"]], "Object Detection": [[16, "object-detection"]], "Our Pledge": [[1, "our-pledge"]], "Our Standards": [[1, "our-standards"]], "Page": [[7, "page"]], "Preparing your model for inference": [[17, null]], "Prerequisites": [[3, "prerequisites"]], "Pretrained community models": [[14, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[14, "pushing-to-the-huggingface-hub"]], "Questions": [[2, "questions"]], "Recognition": [[14, "recognition"], [16, "recognition"]], "Recognition predictors": [[18, "recognition-predictors"]], "Returns:": [[6, "returns"], [7, "returns"], [7, "id11"], [7, "id13"], [7, "id15"], [7, "id19"], [7, "id23"], [7, "id27"], [7, "id31"], [8, "returns"], [8, "id6"], [8, "id11"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id29"], [8, "id34"], [8, "id39"], [8, "id44"], [8, "id49"], [8, "id53"], [8, "id57"], [8, "id62"], [8, "id66"], [8, "id71"], [8, "id76"], [8, "id80"], [8, "id84"], [8, "id88"], [8, "id93"], [8, "id98"], [8, "id102"], [8, "id107"], [8, "id112"], [8, "id117"], [8, "id122"], [8, "id126"], [8, "id130"], [8, "id135"], [8, "id140"], [8, "id145"], [8, "id149"], [8, "id153"], [8, "id158"], [8, "id162"], [8, "id166"], [8, "id168"], [8, "id170"], [8, "id172"], [10, "returns"]], "Scope": [[1, "scope"]], "Share your model with the community": [[14, null]], "Supported Vocabs": [[6, "supported-vocabs"]], "Supported contribution modules": [[5, "supported-contribution-modules"]], "Supported datasets": [[4, "supported-datasets"]], "Supported transformations": [[9, "supported-transformations"]], "Synthetic dataset generator": [[6, "synthetic-dataset-generator"], [16, "synthetic-dataset-generator"]], "Task evaluation": [[10, "task-evaluation"]], "Text Detection": [[18, "text-detection"]], "Text Recognition": [[18, "text-recognition"]], "Text detection models": [[4, "text-detection-models"]], "Text recognition models": [[4, "text-recognition-models"]], "Train your own model": [[12, null]], "Two-stage approaches": [[18, "two-stage-approaches"]], "Unit tests": [[2, "unit-tests"]], "Use your own datasets": [[16, "use-your-own-datasets"]], "Using your ONNX exported model": [[17, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[3, "via-conda-only-for-linux"]], "Via Git": [[3, "via-git"]], "Via Python Package": [[3, "via-python-package"]], "Visualization": [[10, "visualization"]], "What should I do with the output?": [[18, "what-should-i-do-with-the-output"]], "Word": [[7, "word"]], "docTR Notebooks": [[11, null]], "docTR Vocabs": [[6, "id62"]], "docTR: Document Text Recognition": [[4, null]], "doctr.contrib": [[5, null]], "doctr.datasets": [[6, null], [6, "datasets"]], "doctr.io": [[7, null]], "doctr.models": [[8, null]], "doctr.models.classification": [[8, "doctr-models-classification"]], "doctr.models.detection": [[8, "doctr-models-detection"]], "doctr.models.factory": [[8, "doctr-models-factory"]], "doctr.models.recognition": [[8, "doctr-models-recognition"]], "doctr.models.zoo": [[8, "doctr-models-zoo"]], "doctr.transforms": [[9, null]], "doctr.utils": [[10, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[7, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[7, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[9, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[6, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[9, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[9, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[6, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[6, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[8, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[6, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[6, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[7, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[7, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[6, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[6, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[9, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[9, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[6, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[6, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[6, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[6, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[6, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[8, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[9, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[7, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[6, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[9, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[8, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[6, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[9, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[7, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[9, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[9, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[9, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[9, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[9, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[9, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[9, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[9, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[9, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[9, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[9, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[9, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[7, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[7, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[7, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[6, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[9, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[7, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[7, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[6, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[6, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[6, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[6, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[9, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[10, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[6, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[7, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[6, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[6, 0, 1, "", "CORD"], [6, 0, 1, "", "CharacterGenerator"], [6, 0, 1, "", "DetectionDataset"], [6, 0, 1, "", "DocArtefacts"], [6, 0, 1, "", "FUNSD"], [6, 0, 1, "", "IC03"], [6, 0, 1, "", "IC13"], [6, 0, 1, "", "IIIT5K"], [6, 0, 1, "", "IIITHWS"], [6, 0, 1, "", "IMGUR5K"], [6, 0, 1, "", "MJSynth"], [6, 0, 1, "", "OCRDataset"], [6, 0, 1, "", "RecognitionDataset"], [6, 0, 1, "", "SROIE"], [6, 0, 1, "", "SVHN"], [6, 0, 1, "", "SVT"], [6, 0, 1, "", "SynthText"], [6, 0, 1, "", "WILDRECEIPT"], [6, 0, 1, "", "WordGenerator"], [6, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[6, 0, 1, "", "DataLoader"]], "doctr.io": [[7, 0, 1, "", "Artefact"], [7, 0, 1, "", "Block"], [7, 0, 1, "", "Document"], [7, 0, 1, "", "DocumentFile"], [7, 0, 1, "", "Line"], [7, 0, 1, "", "Page"], [7, 0, 1, "", "Word"], [7, 1, 1, "", "decode_img_as_tensor"], [7, 1, 1, "", "read_html"], [7, 1, 1, "", "read_img_as_numpy"], [7, 1, 1, "", "read_img_as_tensor"], [7, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[7, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[7, 2, 1, "", "from_images"], [7, 2, 1, "", "from_pdf"], [7, 2, 1, "", "from_url"]], "doctr.io.Page": [[7, 2, 1, "", "show"]], "doctr.models": [[8, 1, 1, "", "kie_predictor"], [8, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[8, 1, 1, "", "crop_orientation_predictor"], [8, 1, 1, "", "magc_resnet31"], [8, 1, 1, "", "mobilenet_v3_large"], [8, 1, 1, "", "mobilenet_v3_large_r"], [8, 1, 1, "", "mobilenet_v3_small"], [8, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [8, 1, 1, "", "mobilenet_v3_small_page_orientation"], [8, 1, 1, "", "mobilenet_v3_small_r"], [8, 1, 1, "", "page_orientation_predictor"], [8, 1, 1, "", "resnet18"], [8, 1, 1, "", "resnet31"], [8, 1, 1, "", "resnet34"], [8, 1, 1, "", "resnet50"], [8, 1, 1, "", "textnet_base"], [8, 1, 1, "", "textnet_small"], [8, 1, 1, "", "textnet_tiny"], [8, 1, 1, "", "vgg16_bn_r"], [8, 1, 1, "", "vit_b"], [8, 1, 1, "", "vit_s"]], "doctr.models.detection": [[8, 1, 1, "", "db_mobilenet_v3_large"], [8, 1, 1, "", "db_resnet50"], [8, 1, 1, "", "detection_predictor"], [8, 1, 1, "", "fast_base"], [8, 1, 1, "", "fast_small"], [8, 1, 1, "", "fast_tiny"], [8, 1, 1, "", "linknet_resnet18"], [8, 1, 1, "", "linknet_resnet34"], [8, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[8, 1, 1, "", "from_hub"], [8, 1, 1, "", "login_to_hub"], [8, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[8, 1, 1, "", "crnn_mobilenet_v3_large"], [8, 1, 1, "", "crnn_mobilenet_v3_small"], [8, 1, 1, "", "crnn_vgg16_bn"], [8, 1, 1, "", "master"], [8, 1, 1, "", "parseq"], [8, 1, 1, "", "recognition_predictor"], [8, 1, 1, "", "sar_resnet31"], [8, 1, 1, "", "vitstr_base"], [8, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[9, 0, 1, "", "ChannelShuffle"], [9, 0, 1, "", "ColorInversion"], [9, 0, 1, "", "Compose"], [9, 0, 1, "", "GaussianBlur"], [9, 0, 1, "", "GaussianNoise"], [9, 0, 1, "", "LambdaTransformation"], [9, 0, 1, "", "Normalize"], [9, 0, 1, "", "OneOf"], [9, 0, 1, "", "RandomApply"], [9, 0, 1, "", "RandomBrightness"], [9, 0, 1, "", "RandomContrast"], [9, 0, 1, "", "RandomCrop"], [9, 0, 1, "", "RandomGamma"], [9, 0, 1, "", "RandomHorizontalFlip"], [9, 0, 1, "", "RandomHue"], [9, 0, 1, "", "RandomJpegQuality"], [9, 0, 1, "", "RandomResize"], [9, 0, 1, "", "RandomRotate"], [9, 0, 1, "", "RandomSaturation"], [9, 0, 1, "", "RandomShadow"], [9, 0, 1, "", "Resize"], [9, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[10, 0, 1, "", "DetectionMetric"], [10, 0, 1, "", "LocalizationConfusion"], [10, 0, 1, "", "OCRMetric"], [10, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.visualization": [[10, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [1, 7, 8, 10, 14, 17], "0": [1, 3, 6, 9, 10, 12, 15, 16, 18], "00": 18, "01": 18, "0123456789": 6, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "02562": 8, "03": 18, "035": 18, "0361328125": 18, "04": 18, "05": 18, "06": 18, "06640625": 18, "07": 18, "08": [9, 18], "09": 18, "0966796875": 18, "1": [6, 7, 8, 9, 10, 12, 16, 18], "10": [6, 10, 18], "100": [6, 9, 10, 16, 18], "1000": 18, "101": 6, "1024": [8, 12, 18], "104": 6, "106": 6, "108": 6, "1095": 16, "11": 18, "110": 10, "1107": 16, "114": 6, "115": 6, "1156": 16, "116": 6, "118": 6, "11800h": 18, "11th": 18, "12": 18, "120": 6, "123": 6, "126": 6, "1268": 16, "128": [8, 12, 17, 18], "13": 18, "130": 6, "13068": 16, "131": 6, "1337891": 16, "1357421875": 18, "1396484375": 18, "14": 18, "1420": 18, "14470v1": 6, "149": 16, "15": 18, "150": [10, 18], "1552": 18, "16": [8, 17, 18], "1630859375": 18, "1684": 18, "16x16": 8, "17": 18, "1778": 18, "1782": 18, "18": [8, 18], "185546875": 18, "1900": 18, "1910": 8, "19342": 16, "19370": 16, "195": 6, "19598": 16, "199": 18, "1999": 18, "2": [3, 4, 6, 7, 9, 15, 18], "20": 18, "200": 10, "2000": 16, "2003": [4, 6], "2012": 6, "2013": [4, 6], "2015": 6, "2019": 4, "207901": 16, "21": 18, "2103": 6, "2186": 16, "21888": 16, "22": 18, "224": [8, 9], "225": 9, "22672": 16, "229": [9, 16], "23": 18, "233": 16, "236": 6, "24": 18, "246": 16, "249": 16, "25": 18, "2504": 18, "255": [7, 8, 9, 10, 18], "256": 8, "257": 16, "26": 18, "26032": 16, "264": 12, "27": 18, "2700": 16, "2710": 18, "2749": 12, "28": 18, "287": 12, "29": 18, "296": 12, "299": 12, "2d": 18, "3": [3, 4, 7, 8, 9, 10, 17, 18], "30": 18, "300": 16, "3000": 16, "301": 12, "30595": 18, "30ghz": 18, "31": 8, "32": [6, 8, 9, 12, 16, 17, 18], "3232421875": 18, "33": [9, 18], "33402": 16, "33608": 16, "34": [8, 18], "340": 18, "3456": 18, "3515625": 18, "36": 18, "360": 16, "37": [6, 18], "38": 18, "39": 18, "4": [8, 9, 10, 18], "40": 18, "406": 9, "41": 18, "42": 18, "43": 18, "44": 18, "45": 18, "456": 9, "46": 18, "47": 18, "472": 16, "48": [6, 18], "485": 9, "49": 18, "49377": 16, "5": [6, 9, 10, 15, 18], "50": [8, 16, 18], "51": 18, "51171875": 18, "512": 8, "52": [6, 18], "529": 18, "53": 18, "54": 18, "540": 18, "5478515625": 18, "55": 18, "56": 18, "57": 18, "58": [6, 18], "580": 18, "5810546875": 18, "583": 18, "59": 18, "597": 18, "5k": [4, 6], "5m": 18, "6": [9, 18], "60": 9, "600": [8, 10, 18], "61": 18, "62": 18, "626": 16, "63": 18, "64": [8, 9, 18], "641": 18, "647": 16, "65": 18, "66": 18, "67": 18, "68": 18, "69": 18, "693": 12, "694": 12, "695": 12, "6m": 18, "7": 18, "70": [6, 10, 18], "707470": 16, "71": [6, 18], "7100000": 16, "7141797": 16, "7149": 16, "72": 18, "72dpi": 7, "73": 18, "73257": 16, "74": 18, "75": [9, 18], "7581382": 16, "76": 18, "77": 18, "772": 12, "772875": 16, "78": 18, "785": 12, "79": 18, "793533": 16, "796": 16, "798": 12, "7m": 18, "8": [8, 9, 18], "80": 18, "800": [8, 10, 16, 18], "81": 18, "82": 18, "83": 18, "84": 18, "849": 16, "85": 18, "8564453125": 18, "857": 18, "85875": 16, "86": 18, "8603515625": 18, "87": 18, "8707": 16, "88": 18, "89": 18, "9": [3, 9, 18], "90": 18, "90k": 6, "90kdict32px": 6, "91": 18, "914085328578949": 18, "92": 18, "93": 18, "94": [6, 18], "95": [10, 18], "9578408598899841": 18, "96": 18, "97": 18, "98": 18, "99": 18, "9949972033500671": 18, "A": [1, 2, 4, 6, 7, 8, 11, 17], "As": 2, "Be": 18, "Being": 1, "By": 13, "For": [1, 2, 3, 12, 18], "If": [2, 7, 8, 12, 18], "In": [2, 6, 16], "It": [9, 14, 15, 17], "Its": [4, 8], "No": [1, 18], "Of": 6, "Or": [15, 17], "The": [1, 2, 6, 7, 10, 13, 15, 16, 17, 18], "Then": 8, "To": [2, 3, 13, 14, 15, 17, 18], "_": [1, 6, 8], "__call__": 18, "_build": 2, "_i": 10, "ab": 6, "abc": 17, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "abdef": [6, 16], "abl": [16, 18], "about": [1, 16, 18], "abov": 18, "abstractdataset": 6, "abus": 1, "accept": 1, "access": [4, 7, 16, 18], "account": [1, 14], "accur": 18, "accuraci": 10, "achiev": 17, "act": 1, "action": 1, "activ": 4, "ad": [2, 8, 9], "adapt": 1, "add": [9, 10, 14, 18], "add_hook": 18, "add_label": 10, "addit": [2, 3, 7, 15, 18], "addition": [2, 18], "address": [1, 7], "adjust": 9, "advanc": 1, "advantag": 17, "advis": 2, "aesthet": [4, 6], "affect": 1, "after": [14, 18], "ag": 1, "again": 8, "aggreg": [10, 16], "aggress": 1, "align": [1, 7, 9], "all": [1, 2, 5, 6, 7, 9, 10, 15, 16, 18], "allow": [1, 17], "along": 18, "alreadi": [2, 17], "also": [1, 8, 14, 15, 16, 18], "alwai": 16, "an": [1, 2, 4, 6, 7, 8, 10, 15, 17, 18], "analysi": [7, 15], "ancient_greek": 6, "angl": [7, 9], "ani": [1, 6, 7, 8, 9, 10, 17, 18], "annot": 6, "anot": 16, "anoth": [8, 12, 16], "answer": 1, "anyascii": 10, "anyon": 4, "anyth": 15, "api": [2, 4], "apolog": 1, "apologi": 1, "app": 2, "appear": 1, "appli": [1, 6, 9], "applic": [4, 8], "appoint": 1, "appreci": 14, "appropri": [1, 2, 18], "ar": [1, 2, 3, 5, 6, 7, 9, 10, 11, 15, 16, 18], "arab": 6, "arabic_diacrit": 6, "arabic_lett": 6, "arabic_punctu": 6, "arbitrarili": [4, 8], "arch": [8, 14], "architectur": [4, 8, 14, 15], "area": 18, "argument": [6, 7, 8, 10, 12, 18], "around": 1, "arrai": [7, 9, 10], "art": [4, 15], "artefact": [10, 15, 18], "artefact_typ": 7, "artifici": [4, 6], "arxiv": [6, 8], "asarrai": 10, "ascii_lett": 6, "aspect": [4, 8, 9, 18], "assess": 10, "assign": 10, "associ": 7, "assum": 8, "assume_straight_pag": [8, 12, 18], "astyp": [8, 10, 18], "attack": 1, "attend": [4, 8], "attent": [1, 8], "autom": 4, "automat": 18, "autoregress": [4, 8], "avail": [1, 4, 5, 9], "averag": [9, 18], "avoid": [1, 3], "aw": [4, 18], "awar": 18, "azur": 18, "b": [8, 10, 18], "b_j": 10, "back": 2, "backbon": 8, "backend": 18, "background": 16, "bangla": 6, "bar": 15, "bar_cod": 16, "base": [4, 8, 15], "baselin": [4, 8, 18], "batch": [6, 8, 9, 15, 16, 18], "batch_siz": [6, 12, 15, 16, 17], "bblanchon": 3, "bbox": 18, "becaus": 13, "been": [2, 10, 16, 18], "befor": [6, 8, 9, 18], "begin": 10, "behavior": [1, 18], "being": [10, 18], "belong": 18, "benchmark": 18, "best": 1, "better": [11, 18], "between": [9, 10, 18], "bgr": 7, "bilinear": 9, "bin_thresh": 18, "binar": [4, 8, 18], "binari": [7, 17, 18], "bit": 17, "block": [10, 18], "block_1_1": 18, "blur": 9, "bmvc": 6, "bn": 14, "bodi": [1, 18], "bool": [6, 7, 8, 9, 10], "boolean": [8, 18], "both": [4, 6, 9, 16, 18], "bottom": [8, 18], "bound": [6, 7, 8, 9, 10, 15, 16, 18], "box": [6, 7, 8, 9, 10, 15, 16, 18], "box_thresh": 18, "bright": 9, "browser": [2, 4], "build": [2, 3, 17], "built": 2, "byte": [7, 18], "c": [3, 7, 10], "c_j": 10, "cach": [2, 6, 13], "cache_sampl": 6, "call": 17, "callabl": [6, 9], "can": [2, 3, 12, 13, 14, 15, 16, 18], "capabl": [2, 11, 18], "case": [6, 10], "cf": 18, "cfg": 18, "challeng": 6, "challenge2_test_task12_imag": 6, "challenge2_test_task1_gt": 6, "challenge2_training_task12_imag": 6, "challenge2_training_task1_gt": 6, "chang": [13, 18], "channel": [1, 2, 7, 9], "channel_prior": 3, "channelshuffl": 9, "charact": [4, 6, 7, 10, 16, 18], "charactergener": [6, 16], "characterist": 1, "charg": 18, "charset": 18, "chart": 7, "check": [2, 14, 18], "checkpoint": 8, "chip": 3, "ci": 2, "clarifi": 1, "clariti": 1, "class": [1, 6, 7, 9, 10, 18], "class_nam": 12, "classif": [16, 18], "classmethod": 7, "clear": 2, "clone": 3, "close": 2, "co": 14, "code": [4, 7, 15], "codecov": 2, "colab": 11, "collate_fn": 6, "collect": [7, 15], "color": 9, "colorinvers": 9, "column": 7, "com": [1, 3, 7, 8, 14], "combin": 18, "command": [2, 15], "comment": 1, "commit": 1, "common": [1, 9, 10, 17], "commun": 1, "compar": 4, "comparison": [10, 18], "competit": 6, "compil": [11, 18], "complaint": 1, "complementari": 10, "complet": 2, "compon": 18, "compos": [6, 18], "comprehens": 18, "comput": [6, 10, 17, 18], "conf_threshold": 15, "confid": [7, 18], "config": [3, 8], "configur": 8, "confus": 10, "consecut": [9, 18], "consequ": 1, "consid": [1, 2, 6, 7, 10, 18], "consist": 18, "consolid": [4, 6], "constant": 9, "construct": 1, "contact": 1, "contain": [5, 6, 11, 16, 18], "content": [6, 7, 18], "context": 8, "contib": 3, "continu": 1, "contrast": 9, "contrast_factor": 9, "contrib": [3, 15], "contribut": 1, "contributor": 2, "convers": 7, "convert": [7, 9], "convolut": 8, "coordin": [7, 18], "cord": [4, 6, 16, 18], "core": [10, 18], "corner": 18, "correct": 9, "correspond": [3, 7, 9, 18], "could": [1, 15], "counterpart": 10, "cover": 2, "coverag": 2, "cpu": [4, 12, 17], "creat": 14, "crnn": [4, 8, 14], "crnn_mobilenet_v3_larg": [8, 14, 18], "crnn_mobilenet_v3_smal": [8, 17, 18], "crnn_vgg16_bn": [8, 12, 14, 18], "crop": [7, 8, 9, 12, 16, 18], "crop_orient": [7, 18], "crop_orientation_predictor": [8, 12], "crop_param": 12, "cuda": 17, "currenc": 6, "current": [2, 12, 18], "custom": [14, 15, 17, 18], "custom_crop_orientation_model": 12, "custom_page_orientation_model": 12, "customhook": 18, "cvit": 4, "czczup": 8, "czech": 6, "d": [6, 16], "danish": 6, "data": [4, 6, 7, 9, 10, 12, 14], "dataload": 16, "dataset": [8, 12, 18], "dataset_info": 6, "date": [12, 18], "db": 14, "db_mobilenet_v3_larg": [8, 14, 18], "db_resnet34": 18, "db_resnet50": [8, 12, 14, 18], "dbnet": [4, 8], "deal": [11, 18], "decis": 1, "decod": 7, "decode_img_as_tensor": 7, "dedic": 17, "deem": 1, "deep": [8, 18], "def": 18, "default": [3, 7, 12, 13, 18], "defer": 16, "defin": [10, 17], "degre": [7, 9, 18], "degress": 7, "delet": 2, "delimit": 18, "delta": 9, "demo": [2, 4], "demonstr": 1, "depend": [2, 3, 4, 18], "deploi": 2, "deploy": 4, "derogatori": 1, "describ": 8, "descript": 11, "design": 9, "desir": 7, "det_arch": [8, 12, 14, 17], "det_b": 18, "det_model": [12, 14, 17], "det_param": 12, "det_predictor": [12, 18], "detail": [12, 18], "detect": [6, 7, 10, 11, 12, 15], "detect_languag": 8, "detect_orient": [8, 12, 18], "detection_predictor": [8, 18], "detection_task": [6, 16], "detectiondataset": [6, 16], "detectionmetr": 10, "detectionpredictor": [8, 12], "detector": [4, 8, 15], "deterior": 8, "determin": 1, "dev": [2, 13], "develop": 3, "deviat": 9, "devic": 17, "dict": [7, 10, 18], "dictionari": [7, 10], "differ": 1, "differenti": [4, 8], "digit": [4, 6, 16], "dimens": [7, 10, 18], "dimension": 9, "direct": 6, "directli": [14, 18], "directori": [2, 13], "disabl": [1, 13, 18], "disable_crop_orient": 18, "disable_page_orient": 18, "disclaim": 18, "discuss": 2, "disparag": 1, "displai": [7, 10], "display_artefact": 10, "distribut": 9, "div": 18, "divers": 1, "divid": 7, "do": [2, 3, 8], "doc": [2, 7, 15, 17, 18], "docartefact": [6, 16], "docstr": 2, "doctr": [3, 12, 13, 14, 15, 16, 17, 18], "doctr_cache_dir": 13, "doctr_multiprocessing_dis": 13, "document": [6, 8, 10, 11, 12, 15, 16, 17, 18], "documentbuild": 18, "documentfil": [7, 12, 14, 15, 17], "doesn": 17, "don": [12, 18], "done": 9, "download": [6, 16], "downsiz": 8, "draw": 9, "drop": 6, "drop_last": 6, "dtype": [7, 8, 9, 10, 17], "dual": [4, 6], "dummi": 14, "dummy_img": 18, "dummy_input": 17, "dure": 1, "dutch": 6, "dynam": [6, 15], "dynamic_seq_length": 6, "e": [1, 2, 3, 7, 8], "each": [4, 6, 7, 8, 9, 10, 16, 18], "eas": 2, "easi": [4, 10, 14, 17], "easili": [7, 10, 12, 14, 16, 18], "econom": 1, "edit": 1, "educ": 1, "effect": 18, "effici": [2, 4, 6, 8], "either": [10, 18], "element": [6, 7, 8, 18], "els": [2, 15], "email": 1, "empathi": 1, "en": 18, "enabl": [6, 7], "enclos": 7, "encod": [4, 6, 7, 8, 18], "encode_sequ": 6, "encount": 2, "encrypt": 7, "end": [4, 6, 8, 10], "english": [6, 16], "enough": [2, 18], "ensur": 2, "entri": 6, "environ": [1, 13], "eo": 6, "equiv": 18, "estim": 8, "etc": [7, 15], "ethnic": 1, "evalu": [16, 18], "event": 1, "everyon": 1, "everyth": [2, 18], "exact": [10, 18], "exampl": [1, 2, 4, 6, 8, 14, 18], "exchang": 17, "execut": 18, "exist": 14, "expand": 9, "expect": [7, 9, 10], "experi": 1, "explan": [1, 18], "explicit": 1, "exploit": [4, 8], "export": [7, 8, 10, 11, 15, 18], "export_as_straight_box": [8, 18], "export_as_xml": 18, "export_model_to_onnx": 17, "express": [1, 9], "extens": 7, "extern": [1, 16], "extract": [4, 6], "extractor": 8, "f_": 10, "f_a": 10, "factor": 9, "fair": 1, "fairli": 1, "fals": [6, 7, 8, 9, 10, 12, 18], "faq": 1, "fascan": 14, "fast": [4, 6, 8], "fast_bas": [8, 18], "fast_smal": [8, 18], "fast_tini": [8, 18], "faster": [4, 8, 17], "fasterrcnn_mobilenet_v3_large_fpn": 8, "favorit": 18, "featur": [3, 8, 10, 11, 12, 15], "feedback": 1, "feel": [2, 14], "felix92": 14, "few": [17, 18], "figsiz": 10, "figur": [10, 15], "file": [2, 6], "final": 8, "find": [2, 16], "finnish": 6, "first": [2, 6], "firsthand": 6, "fit": [8, 18], "flag": 18, "flip": 9, "float": [7, 9, 10, 17], "float32": [7, 8, 9, 17], "fn": 9, "focu": 14, "focus": [1, 6], "folder": 6, "follow": [1, 2, 3, 6, 9, 10, 12, 13, 14, 15, 18], "font": 6, "font_famili": 6, "foral": 10, "forc": 2, "forg": 3, "form": [4, 6, 18], "format": [7, 10, 12, 16, 17, 18], "forpost": [4, 6], "forum": 2, "fp16": 17, "frac": 10, "framework": [3, 14, 16, 18], "free": [1, 2, 14], "french": [6, 12, 14, 18], "friendli": 4, "from": [1, 4, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18], "from_hub": [8, 14], "from_imag": [7, 14, 15, 17], "from_pdf": 7, "from_url": 7, "full": [6, 10, 18], "function": [6, 9, 10, 15], "funsd": [4, 6, 16, 18], "further": 16, "futur": 6, "g": [7, 8], "g_": 10, "g_x": 10, "gamma": 9, "gaussian": 9, "gaussianblur": 9, "gaussiannois": 9, "gen": 18, "gender": 1, "gener": [2, 4, 7, 8], "generic_cyrillic_lett": 6, "geometri": [4, 7, 18], "geq": 10, "german": [6, 12, 14], "get": [17, 18], "git": 14, "github": [2, 3, 8, 14], "give": [1, 15], "given": [6, 7, 9, 10, 18], "global": 8, "go": 18, "good": 17, "googl": 2, "googlevis": 4, "gpu": [4, 15, 17], "gracefulli": 1, "graph": [4, 6, 7], "grayscal": 9, "ground": 10, "groung": 10, "group": [4, 18], "gt": 10, "gt_box": 10, "gt_label": 10, "guid": 2, "guidanc": 16, "gvision": 18, "h": [7, 8, 9], "h_": 10, "ha": [2, 6, 10, 16], "handl": [11, 16, 18], "handwrit": 6, "handwritten": 16, "harass": 1, "hardwar": 18, "harm": 1, "hat": 10, "have": [1, 2, 10, 12, 14, 16, 17, 18], "head": [8, 18], "healthi": 1, "hebrew": 6, "height": [7, 9], "hello": [10, 18], "help": 17, "here": [5, 9, 11, 15, 16, 18], "hf": 8, "hf_hub_download": 8, "high": 7, "higher": [3, 6, 18], "hindi": 6, "hindi_digit": 6, "hocr": 18, "hook": 18, "horizont": [7, 9, 18], "hous": 6, "how": [2, 11, 12, 14, 16], "howev": 16, "hsv": 9, "html": [1, 2, 3, 7, 18], "http": [1, 3, 6, 7, 8, 14, 18], "hub": 8, "hue": 9, "huggingfac": 8, "hw": 6, "i": [1, 2, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17], "i7": 18, "ic03": [4, 6, 16], "ic13": [4, 6, 16], "icdar": [4, 6], "icdar2019": 6, "id": 18, "ident": 1, "identifi": 4, "iiit": [4, 6], "iiit5k": [6, 16], "iiithw": [4, 6, 16], "imag": [4, 6, 7, 8, 9, 10, 14, 15, 16, 18], "imagenet": 8, "imageri": 1, "images_90k_norm": 6, "img": [6, 9, 16, 17], "img_cont": 7, "img_fold": [6, 16], "img_path": 7, "img_transform": 6, "imgur5k": [4, 6, 16], "imgur5k_annot": 6, "imlist": 6, "impact": 1, "implement": [6, 7, 8, 9, 10, 18], "import": [6, 7, 8, 9, 10, 12, 14, 15, 16, 17, 18], "improv": 8, "inappropri": 1, "incid": 1, "includ": [1, 6, 16, 17], "inclus": 1, "increas": 9, "independ": 9, "index": [2, 7], "indic": 10, "individu": 1, "infer": [4, 8, 9, 15, 18], "inform": [1, 2, 4, 6, 16], "input": [2, 7, 8, 9, 17, 18], "input_crop": 8, "input_pag": [8, 10, 18], "input_shap": 17, "input_tensor": 8, "inspir": [1, 9], "instal": [14, 15, 17], "instanc": [1, 18], "instanti": [8, 18], "instead": [6, 7, 8], "insult": 1, "int": [6, 7, 9], "int64": 10, "integ": 10, "integr": [4, 14, 16], "intel": 18, "interact": [1, 7, 10], "interfac": [14, 17], "interoper": 17, "interpol": 9, "interpret": [6, 7], "intersect": 10, "invert": 9, "investig": 1, "invis": 1, "involv": [1, 18], "io": [12, 14, 15, 17], "iou": 10, "iou_thresh": 10, "iou_threshold": 15, "irregular": [4, 8, 16], "isn": 6, "issu": [1, 2, 14], "italian": 6, "iter": [6, 9, 16, 18], "its": [7, 8, 9, 10, 16, 18], "itself": [8, 14], "j": 10, "job": 2, "join": 2, "jpeg": 9, "jpegqual": 9, "jpg": [6, 7, 14, 17], "json": [6, 16, 18], "json_output": 18, "jump": 2, "just": 1, "kei": [4, 6], "kera": [8, 17], "kernel": [4, 8, 9], "kernel_shap": 9, "keywoard": 8, "keyword": [6, 7, 8, 10], "kie": [8, 12], "kie_predictor": [8, 12], "kiepredictor": 8, "kind": 1, "know": [2, 17], "kwarg": [6, 7, 8, 10], "l": 10, "l_j": 10, "label": [6, 10, 15, 16], "label_fil": [6, 16], "label_fold": 6, "label_path": [6, 16], "labels_path": [6, 16], "ladder": 1, "lambda": 9, "lambdatransform": 9, "lang": 18, "languag": [1, 4, 6, 7, 8, 14, 18], "larg": [8, 14], "largest": 10, "last": [3, 6], "latenc": 8, "later": 2, "latest": 18, "latin": 6, "layer": 17, "layout": 18, "lead": 1, "leader": 1, "learn": [1, 4, 8, 17, 18], "least": 3, "left": [10, 18], "legacy_french": 6, "length": [6, 18], "less": [17, 18], "level": [1, 6, 10, 18], "leverag": 11, "lf": 14, "librari": [2, 3, 11, 12], "light": 4, "lightweight": 17, "like": 1, "limits_": 10, "line": [4, 8, 10, 18], "line_1_1": 18, "link": 12, "linknet": [4, 8], "linknet_resnet18": [8, 12, 17, 18], "linknet_resnet34": [8, 17, 18], "linknet_resnet50": [8, 18], "list": [6, 7, 9, 10, 14], "ll": 10, "load": [4, 6, 8, 15, 17], "load_state_dict": 12, "load_weight": 12, "loc_pr": 18, "local": [2, 4, 6, 8, 10, 16, 18], "localis": 6, "localizationconfus": 10, "locat": [2, 7, 18], "login": 8, "login_to_hub": [8, 14], "logo": [7, 15, 16], "love": 14, "lower": [9, 10, 18], "m": [2, 10, 18], "m1": 3, "macbook": 3, "machin": 17, "made": 4, "magc_resnet31": 8, "mai": [1, 2], "mail": 1, "main": 11, "maintain": 4, "mainten": 2, "make": [1, 2, 10, 12, 13, 14, 17, 18], "mani": [16, 18], "manipul": 18, "map": [6, 8], "map_loc": 12, "master": [4, 8, 18], "match": [10, 18], "mathcal": 10, "matplotlib": [7, 10], "max": [6, 9, 10], "max_angl": 9, "max_area": 9, "max_char": [6, 16], "max_delta": 9, "max_gain": 9, "max_gamma": 9, "max_qual": 9, "max_ratio": 9, "maximum": [6, 9], "maxval": [8, 9], "mbox": 10, "mean": [9, 10, 12], "meaniou": 10, "meant": [7, 17], "measur": 18, "media": 1, "median": 8, "meet": 12, "member": 1, "memori": [13, 17], "mention": 18, "merg": 6, "messag": 2, "meta": 18, "metadata": 17, "metal": 3, "method": [7, 9, 18], "metric": [10, 18], "middl": 18, "might": [17, 18], "min": 9, "min_area": 9, "min_char": [6, 16], "min_gain": 9, "min_gamma": 9, "min_qual": 9, "min_ratio": 9, "min_val": 9, "minde": [1, 3, 4, 8], "minim": [2, 4], "minimalist": [4, 8], "minimum": [3, 6, 9, 10, 18], "minval": 9, "miss": 3, "mistak": 1, "mixed_float16": 17, "mixed_precis": 17, "mjsynth": [4, 6, 16], "mnt": 6, "mobilenet": [8, 14], "mobilenet_v3_larg": 8, "mobilenet_v3_large_r": 8, "mobilenet_v3_smal": [8, 12], "mobilenet_v3_small_crop_orient": [8, 12], "mobilenet_v3_small_page_orient": [8, 12], "mobilenet_v3_small_r": 8, "mobilenetv3": 8, "modal": [4, 6], "mode": 3, "model": [6, 10, 13, 15, 16], "model_nam": [8, 14, 17], "model_path": [15, 17], "moder": 1, "modif": 2, "modifi": [8, 13, 18], "modul": [3, 7, 8, 9, 10, 18], "more": [2, 16, 18], "most": 18, "mozilla": 1, "multi": [4, 8], "multilingu": [6, 14], "multipl": [6, 7, 9, 18], "multipli": 9, "multiprocess": 13, "my": 8, "my_awesome_model": 14, "my_hook": 18, "n": [6, 10], "name": [6, 8, 17, 18], "nation": 1, "natur": [1, 4, 6], "ndarrai": [6, 7, 9, 10], "necessari": [3, 12, 13], "need": [2, 3, 6, 10, 12, 13, 14, 15, 18], "neg": 9, "nest": 18, "network": [4, 6, 8, 17], "neural": [4, 6, 8, 17], "new": [2, 10], "next": [6, 16], "nois": 9, "noisi": [4, 6], "non": [4, 6, 7, 8, 9, 10], "none": [6, 7, 8, 9, 10, 18], "normal": [8, 9], "norwegian": 6, "note": [0, 2, 6, 8, 12, 14, 15, 17], "now": 2, "np": [8, 9, 10, 18], "num_output_channel": 9, "num_sampl": [6, 16], "number": [6, 9, 10, 18], "numpi": [7, 8, 10, 18], "o": 3, "obb": 15, "obj_detect": 14, "object": [6, 7, 10, 15, 18], "objectness_scor": [7, 18], "oblig": 1, "obtain": 18, "occupi": 17, "ocr": [4, 6, 8, 10, 14], "ocr_carea": 18, "ocr_db_crnn": 10, "ocr_lin": 18, "ocr_pag": 18, "ocr_par": 18, "ocr_predictor": [8, 12, 14, 17, 18], "ocrdataset": [6, 16], "ocrmetr": 10, "ocrpredictor": [8, 12], "ocrx_word": 18, "offens": 1, "offici": [1, 8], "offlin": 1, "offset": 9, "onc": 18, "one": [2, 6, 8, 9, 12, 14, 18], "oneof": 9, "ones": [6, 10], "onli": [2, 8, 9, 10, 12, 14, 16, 17, 18], "onlin": 1, "onnx": 15, "onnxruntim": [15, 17], "onnxtr": 17, "opac": 9, "opacity_rang": 9, "open": [1, 2, 14, 17], "opinion": 1, "optic": [4, 18], "optim": [4, 18], "option": [6, 8, 12], "order": [2, 6, 7, 9], "org": [1, 6, 8, 18], "organ": 7, "orient": [1, 7, 8, 11, 15, 18], "orientationpredictor": 8, "other": [1, 2], "otherwis": [1, 7, 10], "our": [2, 8, 18], "out": [2, 8, 9, 10, 18], "outpout": 18, "output": [7, 9, 17], "output_s": [7, 9], "outsid": 13, "over": [6, 10, 18], "overal": [1, 8], "overlai": 7, "overview": 15, "overwrit": 12, "overwritten": 14, "own": 4, "p": [9, 18], "packag": [2, 4, 10, 13, 15, 16, 17], "pad": [6, 8, 9, 18], "page": [3, 6, 8, 10, 12, 18], "page1": 7, "page2": 7, "page_1": 18, "page_idx": [7, 18], "page_orientation_predictor": [8, 12], "page_param": 12, "pair": 10, "paper": 8, "par_1_1": 18, "paragraph": 18, "paragraph_break": 18, "param": [9, 18], "paramet": [4, 7, 8, 17], "pars": [4, 6], "parseq": [4, 8, 14, 17, 18], "part": [6, 9, 18], "parti": 3, "partial": 18, "particip": 1, "pass": [6, 7, 8, 12, 18], "password": 7, "patch": [8, 10], "path": [6, 7, 15, 16, 17], "path_to_checkpoint": 12, "path_to_custom_model": 17, "path_to_pt": 12, "pattern": 1, "pdf": [7, 8, 11], "pdfpage": 7, "peopl": 1, "per": [9, 18], "perform": [4, 7, 8, 9, 10, 13, 17, 18], "period": 1, "permiss": 1, "permut": [4, 8], "persian_lett": 6, "person": [1, 16], "phase": 18, "photo": 16, "physic": [1, 7], "pick": 9, "pictur": 7, "pip": [2, 3, 15, 17], "pipelin": 18, "pixel": [7, 9, 18], "pleas": 2, "plot": 10, "plt": 10, "plug": 14, "plugin": 3, "png": 7, "point": 17, "polici": 13, "polish": 6, "polit": 1, "polygon": [6, 10, 18], "pool": 8, "portugues": 6, "posit": [1, 10], "possibl": [2, 10, 14, 18], "post": [1, 18], "postprocessor": 18, "potenti": 8, "power": 4, "ppageno": 18, "pre": [2, 8, 17], "precis": [10, 18], "pred": 10, "pred_box": 10, "pred_label": 10, "predefin": 16, "predict": [7, 8, 10, 18], "predictor": [4, 7, 8, 11, 12, 14, 17], "prefer": 16, "preinstal": 3, "preprocessor": [12, 18], "prerequisit": 14, "present": 11, "preserv": [8, 9, 18], "preserve_aspect_ratio": [7, 8, 9, 12, 18], "pretrain": [4, 8, 10, 12, 17, 18], "pretrained_backbon": [8, 12], "print": 18, "prior": 6, "privaci": 1, "privat": 1, "probabl": 9, "problem": 2, "procedur": 9, "process": [2, 4, 7, 12, 18], "processor": 18, "produc": [11, 18], "product": 17, "profession": 1, "project": [2, 16], "promptli": 1, "proper": 2, "properli": 6, "provid": [1, 2, 4, 14, 15, 16, 18], "public": [1, 4], "publicli": 18, "publish": 1, "pull": 14, "punctuat": 6, "pure": 6, "purpos": 2, "push_to_hf_hub": [8, 14], "py": 14, "pypdfium2": [3, 7], "pyplot": [7, 10], "python": [2, 15], "python3": 14, "pytorch": [3, 4, 8, 9, 12, 14, 17, 18], "q": 2, "qr": [7, 15], "qr_code": 16, "qualiti": 9, "question": 1, "quickli": 4, "quicktour": 11, "r": 18, "race": 1, "ramdisk": 6, "rand": [8, 9, 10, 17, 18], "random": [8, 9, 10, 18], "randomappli": 9, "randombright": 9, "randomcontrast": 9, "randomcrop": 9, "randomgamma": 9, "randomhorizontalflip": 9, "randomhu": 9, "randomjpegqu": 9, "randomli": 9, "randomres": 9, "randomrot": 9, "randomsatur": 9, "randomshadow": 9, "rang": 9, "rassi": 14, "ratio": [8, 9, 18], "raw": [7, 10], "re": 17, "read": [4, 6, 8], "read_html": 7, "read_img_as_numpi": 7, "read_img_as_tensor": 7, "read_pdf": 7, "readi": 17, "real": [4, 8, 9], "reason": [1, 4, 6], "rebuild": 2, "rebuilt": 2, "recal": [10, 18], "receipt": [4, 6, 18], "reco_arch": [8, 12, 14, 17], "reco_b": 18, "reco_model": [12, 14, 17], "reco_param": 12, "reco_predictor": 12, "recogn": 18, "recognit": [6, 10, 11, 12], "recognition_predictor": [8, 18], "recognition_task": [6, 16], "recognitiondataset": [6, 16], "recognitionpredictor": [8, 12], "rectangular": 8, "reduc": [3, 9], "refer": [2, 3, 12, 14, 15, 16, 18], "regardless": 1, "region": 18, "regroup": 10, "regular": 16, "reject": 1, "rel": [7, 9, 10, 18], "relat": 7, "releas": [0, 3], "relev": 15, "religion": 1, "remov": 1, "render": [7, 18], "repo": 8, "repo_id": [8, 14], "report": 1, "repositori": [6, 8, 14], "repres": [1, 17, 18], "represent": [4, 8], "request": [1, 14], "requir": [3, 9, 17], "research": 4, "residu": 8, "resiz": [9, 18], "resnet": 8, "resnet18": [8, 14], "resnet31": 8, "resnet34": 8, "resnet50": [8, 14], "resolv": 7, "resolve_block": 18, "resolve_lin": 18, "resourc": 16, "respect": 1, "rest": [2, 9, 10], "restrict": 13, "result": [2, 6, 7, 11, 14, 17, 18], "return": 18, "reusabl": 18, "review": 1, "rgb": [7, 9], "rgb_mode": 7, "rgb_output": 7, "right": [1, 8, 10], "robust": [4, 6], "root": 6, "rotat": [6, 7, 8, 9, 10, 11, 12, 16, 18], "run": [2, 3, 8], "same": [2, 7, 10, 16, 17, 18], "sampl": [6, 16, 18], "sample_transform": 6, "sar": [4, 8], "sar_resnet31": [8, 18], "satur": 9, "save": [8, 16], "scale": [7, 8, 9, 10], "scale_rang": 9, "scan": [4, 6], "scene": [4, 6, 8], "score": [7, 10], "script": [2, 16], "seamless": 4, "seamlessli": [4, 18], "search": 8, "searchabl": 11, "sec": 18, "second": 18, "section": [12, 14, 15, 17, 18], "secur": [1, 13], "see": [1, 2], "seen": 18, "segment": [4, 8, 18], "self": 18, "semant": [4, 8], "send": 18, "sens": 10, "sensit": 16, "separ": 18, "sequenc": [4, 6, 7, 8, 10, 18], "sequenti": [9, 18], "seri": 1, "seriou": 1, "set": [1, 3, 6, 8, 10, 13, 15, 18], "set_global_polici": 17, "sever": [7, 9, 18], "sex": 1, "sexual": 1, "shade": 9, "shape": [4, 7, 8, 9, 10, 18], "share": [13, 16], "shift": 9, "shm": 13, "should": [2, 6, 7, 9, 10], "show": [4, 7, 8, 10, 12, 14, 15], "showcas": [2, 11], "shuffl": [6, 9], "side": 10, "signatur": 7, "signific": 16, "simpl": [4, 8, 17], "simpler": 8, "sinc": [6, 16], "singl": [1, 2, 4, 6], "single_img_doc": 17, "size": [1, 6, 7, 9, 15, 18], "skew": 18, "slack": 2, "slightli": 8, "small": [2, 8, 18], "smallest": 7, "snapshot_download": 8, "snippet": 18, "so": [2, 3, 6, 8, 14, 16], "social": 1, "socio": 1, "some": [3, 11, 14, 16], "someth": 2, "somewher": 2, "sort": 1, "sourc": [6, 7, 8, 9, 10, 14], "space": [1, 18], "span": 18, "spanish": 6, "spatial": [4, 6, 7], "specif": [2, 3, 10, 12, 16, 18], "specifi": [1, 6, 7], "speed": [4, 8, 18], "sphinx": 2, "sroie": [4, 6, 16], "stabl": 3, "stackoverflow": 2, "stage": 4, "standalon": 11, "standard": 9, "start": 6, "state": [4, 10, 15], "static": 10, "statu": 1, "std": [9, 12], "step": 13, "still": 18, "str": [6, 7, 8, 9, 10], "straight": [6, 8, 16, 18], "straighten": 18, "straighten_pag": [8, 12, 18], "straigten_pag": 12, "stream": 7, "street": [4, 6], "strict": 3, "strictli": 10, "string": [6, 7, 10, 18], "strive": 3, "strong": [4, 8], "structur": [17, 18], "subset": [6, 18], "suggest": [2, 14], "sum": 10, "summari": 10, "support": [3, 12, 15, 17, 18], "sustain": 1, "svhn": [4, 6, 16], "svt": [6, 16], "swedish": 6, "symmetr": [8, 9, 18], "symmetric_pad": [8, 9, 18], "synthet": 4, "synthtext": [4, 6, 16], "system": 18, "t": [2, 6, 12, 17, 18], "tabl": [14, 15, 16], "take": [1, 6, 18], "target": [6, 7, 9, 10, 16], "target_s": 6, "task": [4, 6, 8, 14, 16, 18], "task2": 6, "team": 3, "techminde": 3, "templat": [2, 4], "tensor": [6, 7, 9, 18], "tensorflow": [3, 4, 7, 8, 9, 12, 14, 17, 18], "tensorspec": 17, "term": 1, "test": [6, 16], "test_set": 6, "text": [6, 7, 8, 10, 16], "text_output": 18, "textmatch": 10, "textnet": 8, "textnet_bas": 8, "textnet_smal": 8, "textnet_tini": 8, "textract": [4, 18], "textstylebrush": [4, 6], "textual": [4, 6, 7, 8, 18], "tf": [3, 7, 8, 9, 14, 17], "than": [2, 10, 14], "thank": 2, "thei": [1, 10], "them": [6, 18], "thi": [1, 2, 3, 5, 6, 9, 10, 12, 13, 14, 16, 17, 18], "thing": [17, 18], "third": 3, "those": [1, 7, 18], "threaten": 1, "threshold": 18, "through": [1, 9, 15, 16], "tilman": 14, "time": [1, 4, 8, 10, 16], "tini": 8, "titl": [7, 18], "tm": 18, "tmp": 13, "togeth": [2, 7], "tograi": 9, "tool": 16, "top": [10, 17, 18], "topic": 2, "torch": [3, 9, 12, 14, 17], "torchvis": 9, "total": 12, "toward": [1, 3], "train": [2, 6, 8, 9, 14, 15, 16, 17, 18], "train_it": [6, 16], "train_load": [6, 16], "train_pytorch": 14, "train_set": [6, 16], "train_tensorflow": 14, "trainabl": [4, 8], "tranform": 9, "transcrib": 18, "transfer": [4, 6], "transfo": 9, "transform": [4, 6, 8], "translat": 1, "troll": 1, "true": [6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18], "truth": 10, "tune": 17, "tupl": [6, 7, 9, 10], "two": [7, 13], "txt": 6, "type": [7, 10, 14, 17, 18], "typic": 18, "u": [1, 2], "ucsd": 6, "udac": 2, "uint8": [7, 8, 10, 18], "ukrainian": 6, "unaccept": 1, "underli": [16, 18], "underneath": 7, "understand": [4, 6, 18], "uniform": [8, 9], "uniformli": 9, "uninterrupt": [7, 18], "union": 10, "unittest": 2, "unlock": 7, "unoffici": 8, "unprofession": 1, "unsolicit": 1, "unsupervis": 4, "unwelcom": 1, "up": [8, 18], "updat": 10, "upgrad": 2, "upper": [6, 9], "uppercas": 16, "url": 7, "us": [1, 2, 3, 6, 8, 10, 11, 12, 13, 14, 15, 18], "usabl": 18, "usag": [13, 17], "use_polygon": [6, 10, 16], "useabl": 18, "user": [4, 7, 11], "utf": 18, "util": 17, "v1": 14, "v3": [8, 14, 18], "valid": 16, "valu": [2, 7, 9, 18], "valuabl": 4, "variabl": 13, "varieti": 6, "veri": 8, "version": [1, 2, 3, 17, 18], "vgg": 8, "vgg16": 14, "vgg16_bn_r": 8, "via": 1, "vietnames": 6, "view": [4, 6], "viewpoint": 1, "violat": 1, "visibl": 1, "vision": [4, 6, 8], "visiondataset": 6, "visiontransform": 8, "visual": [3, 4, 15], "visualize_pag": 10, "vit_": 8, "vit_b": 8, "vitstr": [4, 8, 17], "vitstr_bas": [8, 18], "vitstr_smal": [8, 12, 17, 18], "viz": 3, "vocab": [12, 14, 16, 17, 18], "vocabulari": [6, 12, 14], "w": [7, 8, 9, 10], "w3": 18, "wa": 1, "wai": [1, 4, 16], "want": [2, 17, 18], "warmup": 18, "wasn": 2, "we": [1, 2, 3, 4, 7, 9, 12, 14, 16, 17, 18], "weasyprint": 7, "web": [2, 7], "websit": 6, "welcom": 1, "well": [1, 17], "were": [1, 7, 18], "what": 1, "when": [1, 2, 8], "whenev": 2, "where": [2, 7, 9, 10], "whether": [2, 6, 7, 9, 10, 16, 18], "which": [1, 8, 13, 15, 16, 18], "whichev": 3, "while": [9, 18], "why": 1, "width": [7, 9], "wiki": 1, "wildreceipt": [4, 6, 16], "window": [8, 10], "wish": 2, "within": 1, "without": [1, 6, 8], "wonder": 2, "word": [4, 6, 8, 10, 18], "word_1_1": 18, "word_1_2": 18, "word_1_3": 18, "wordgener": [6, 16], "words_onli": 10, "work": [12, 13, 18], "workflow": 2, "worklow": 2, "world": [10, 18], "worth": 8, "wrap": 18, "wrapper": [6, 9], "write": 13, "written": [1, 7], "www": [1, 7, 18], "x": [7, 9, 10], "x_ascend": 18, "x_descend": 18, "x_i": 10, "x_size": 18, "x_wconf": 18, "xhtml": 18, "xmax": 7, "xmin": 7, "xml": 18, "xml_bytes_str": 18, "xml_element": 18, "xml_output": 18, "xmln": 18, "y": 10, "y_i": 10, "y_j": 10, "yet": 15, "ymax": 7, "ymin": 7, "yolov8": 15, "you": [2, 3, 6, 7, 8, 12, 13, 14, 15, 16, 17, 18], "your": [2, 4, 7, 10, 18], "yoursit": 7, "zero": [9, 10], "zoo": 12, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 6, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 6, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 6, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 6, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 6, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 6, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 6, "\u00e4\u00f6\u00e4\u00f6": 6, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 6, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 6, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 6, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 6, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 6, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 6, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0905": 6, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 6, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 6, "\u0950": 6, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 6, "\u09bd": 6, "\u09ce": 6, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 6}, "titles": ["Changelog", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 2, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 1], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 1], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 1], "31": 0, "4": [0, 1], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 18, "approach": 18, "architectur": 18, "arg": [6, 7, 8, 9, 10], "artefact": 7, "artefactdetect": 15, "attribut": 1, "avail": [15, 16, 18], "aw": 13, "ban": 1, "block": 7, "bug": 2, "changelog": 0, "choos": [16, 18], "classif": [8, 12, 14], "code": [1, 2], "codebas": 2, "commit": 2, "commun": 14, "compos": 9, "conda": 3, "conduct": 1, "connect": 2, "continu": 2, "contrib": 5, "contribut": [2, 5, 15], "contributor": 1, "convent": 14, "correct": 1, "coven": 1, "custom": [6, 12], "data": 16, "dataload": 6, "dataset": [4, 6, 16], "detect": [4, 8, 14, 16, 18], "develop": 2, "do": 18, "doctr": [2, 4, 5, 6, 7, 8, 9, 10, 11], "document": [2, 4, 7], "end": 18, "enforc": 1, "evalu": 10, "export": 17, "factori": 8, "featur": [2, 4], "feedback": 2, "file": 7, "from": 14, "gener": [6, 16], "git": 3, "guidelin": 1, "half": 17, "hub": 14, "huggingfac": 14, "i": 18, "infer": 17, "instal": [2, 3], "integr": [2, 15], "io": 7, "lambda": 13, "let": 2, "line": 7, "linux": 3, "load": [12, 14, 16], "loader": 6, "main": 4, "mode": 2, "model": [4, 8, 12, 14, 17, 18], "modifi": 2, "modul": [5, 15], "name": 14, "notebook": 11, "object": 16, "ocr": [16, 18], "onli": 3, "onnx": 17, "optim": 17, "option": 18, "orient": 12, "our": 1, "output": 18, "own": [12, 16], "packag": 3, "page": 7, "perman": 1, "pipelin": 15, "pledg": 1, "precis": 17, "predictor": 18, "prepar": 17, "prerequisit": 3, "pretrain": 14, "push": 14, "python": 3, "qualiti": 2, "question": 2, "read": 7, "readi": 16, "recognit": [4, 8, 14, 16, 18], "report": 2, "request": 2, "respons": 1, "return": [6, 7, 8, 10], "right": 18, "scope": 1, "share": 14, "should": 18, "stage": 18, "standard": 1, "structur": [2, 7], "style": 2, "support": [4, 5, 6, 9], "synthet": [6, 16], "task": 10, "temporari": 1, "test": 2, "text": [4, 18], "train": 12, "transform": 9, "two": 18, "unit": 2, "us": [16, 17], "util": 10, "v0": 0, "verif": 2, "via": 3, "visual": 10, "vocab": 6, "warn": 1, "what": 18, "word": 7, "your": [12, 14, 15, 16, 17], "zoo": [4, 8]}})
\ No newline at end of file
diff --git a/using_doctr/custom_models_training.html b/using_doctr/custom_models_training.html
index 580b4368b7..e664c6a950 100644
--- a/using_doctr/custom_models_training.html
+++ b/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -615,7 +615,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/using_doctr/running_on_aws.html b/using_doctr/running_on_aws.html
index ddb0c3c80f..81c38b49f5 100644
--- a/using_doctr/running_on_aws.html
+++ b/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -358,7 +358,7 @@ AWS Lambda
-
+
diff --git a/using_doctr/sharing_models.html b/using_doctr/sharing_models.html
index 07a3b2f2a3..4f5d1d68a5 100644
--- a/using_doctr/sharing_models.html
+++ b/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -540,7 +540,7 @@ Recognition
-
+
diff --git a/using_doctr/using_contrib_modules.html b/using_doctr/using_contrib_modules.html
index b4a10925e6..cf282ff3a4 100644
--- a/using_doctr/using_contrib_modules.html
+++ b/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -411,7 +411,7 @@ ArtefactDetection
-
+
diff --git a/using_doctr/using_datasets.html b/using_doctr/using_datasets.html
index 4a52df36ba..e30b6d6459 100644
--- a/using_doctr/using_datasets.html
+++ b/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -638,7 +638,7 @@ Data Loading
-
+
diff --git a/using_doctr/using_model_export.html b/using_doctr/using_model_export.html
index 2b30ee63a1..ad9d09ed4c 100644
--- a/using_doctr/using_model_export.html
+++ b/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -463,7 +463,7 @@ Using your ONNX exported model
-
+
diff --git a/using_doctr/using_models.html b/using_doctr/using_models.html
index 13cb06116b..5c80dbf62d 100644
--- a/using_doctr/using_models.html
+++ b/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1249,7 +1249,7 @@ Advanced options
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/cord.html b/v0.1.0/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.0/_modules/doctr/datasets/cord.html
+++ b/v0.1.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/detection.html b/v0.1.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.0/_modules/doctr/datasets/detection.html
+++ b/v0.1.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/doc_artefacts.html b/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/funsd.html b/v0.1.0/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.0/_modules/doctr/datasets/funsd.html
+++ b/v0.1.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ic03.html b/v0.1.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.0/_modules/doctr/datasets/ic03.html
+++ b/v0.1.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ic13.html b/v0.1.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.0/_modules/doctr/datasets/ic13.html
+++ b/v0.1.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/iiit5k.html b/v0.1.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/iiithws.html b/v0.1.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/imgur5k.html b/v0.1.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/loader.html b/v0.1.0/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.0/_modules/doctr/datasets/loader.html
+++ b/v0.1.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/mjsynth.html b/v0.1.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ocr.html b/v0.1.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.0/_modules/doctr/datasets/ocr.html
+++ b/v0.1.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/recognition.html b/v0.1.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.0/_modules/doctr/datasets/recognition.html
+++ b/v0.1.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/sroie.html b/v0.1.0/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.0/_modules/doctr/datasets/sroie.html
+++ b/v0.1.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/svhn.html b/v0.1.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.0/_modules/doctr/datasets/svhn.html
+++ b/v0.1.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/svt.html b/v0.1.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.0/_modules/doctr/datasets/svt.html
+++ b/v0.1.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/synthtext.html b/v0.1.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/utils.html b/v0.1.0/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.0/_modules/doctr/datasets/utils.html
+++ b/v0.1.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/wildreceipt.html b/v0.1.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.0/_modules/doctr/io/elements.html b/v0.1.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.0/_modules/doctr/io/elements.html
+++ b/v0.1.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.0/_modules/doctr/io/html.html b/v0.1.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.0/_modules/doctr/io/html.html
+++ b/v0.1.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.0/_modules/doctr/io/image/base.html b/v0.1.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.0/_modules/doctr/io/image/base.html
+++ b/v0.1.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.0/_modules/doctr/io/image/tensorflow.html b/v0.1.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/io/pdf.html b/v0.1.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.0/_modules/doctr/io/pdf.html
+++ b/v0.1.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.0/_modules/doctr/io/reader.html b/v0.1.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.0/_modules/doctr/io/reader.html
+++ b/v0.1.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/zoo.html b/v0.1.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/zoo.html b/v0.1.0/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/models/factory/hub.html b/v0.1.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.0/_modules/doctr/models/factory/hub.html
+++ b/v0.1.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/zoo.html b/v0.1.0/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/models/zoo.html b/v0.1.0/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.0/_modules/doctr/models/zoo.html
+++ b/v0.1.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/transforms/modules/base.html b/v0.1.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/utils/metrics.html b/v0.1.0/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.0/_modules/doctr/utils/metrics.html
+++ b/v0.1.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.0/_modules/doctr/utils/visualization.html b/v0.1.0/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.0/_modules/doctr/utils/visualization.html
+++ b/v0.1.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.0/_modules/index.html b/v0.1.0/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.0/_modules/index.html
+++ b/v0.1.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.0/_sources/getting_started/installing.rst.txt b/v0.1.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.0/_sources/getting_started/installing.rst.txt
+++ b/v0.1.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.0/_static/basic.css b/v0.1.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.0/_static/basic.css
+++ b/v0.1.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.0/_static/doctools.js b/v0.1.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.0/_static/doctools.js
+++ b/v0.1.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.0/_static/language_data.js b/v0.1.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.0/_static/language_data.js
+++ b/v0.1.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.0/_static/searchtools.js b/v0.1.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.0/_static/searchtools.js
+++ b/v0.1.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.0/changelog.html b/v0.1.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.0/changelog.html
+++ b/v0.1.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.0/community/resources.html b/v0.1.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.0/community/resources.html
+++ b/v0.1.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.0/contributing/code_of_conduct.html b/v0.1.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.0/contributing/code_of_conduct.html
+++ b/v0.1.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.0/contributing/contributing.html b/v0.1.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.0/contributing/contributing.html
+++ b/v0.1.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.0/genindex.html b/v0.1.0/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.0/genindex.html
+++ b/v0.1.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.0/getting_started/installing.html b/v0.1.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.0/getting_started/installing.html
+++ b/v0.1.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.0/index.html b/v0.1.0/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.0/index.html
+++ b/v0.1.0/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.0/modules/contrib.html b/v0.1.0/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.0/modules/contrib.html
+++ b/v0.1.0/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.0/modules/datasets.html b/v0.1.0/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.0/modules/datasets.html
+++ b/v0.1.0/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.0/modules/io.html b/v0.1.0/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.0/modules/io.html
+++ b/v0.1.0/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.0/modules/models.html b/v0.1.0/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.0/modules/models.html
+++ b/v0.1.0/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.0/modules/transforms.html b/v0.1.0/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.0/modules/transforms.html
+++ b/v0.1.0/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.0/modules/utils.html b/v0.1.0/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.0/modules/utils.html
+++ b/v0.1.0/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.0/notebooks.html b/v0.1.0/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.0/notebooks.html
+++ b/v0.1.0/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.0/search.html b/v0.1.0/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.0/search.html
+++ b/v0.1.0/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.0/searchindex.js b/v0.1.0/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.0/searchindex.js
+++ b/v0.1.0/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.0/using_doctr/custom_models_training.html b/v0.1.0/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.0/using_doctr/custom_models_training.html
+++ b/v0.1.0/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.0/using_doctr/running_on_aws.html b/v0.1.0/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.0/using_doctr/running_on_aws.html
+++ b/v0.1.0/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.0/using_doctr/sharing_models.html b/v0.1.0/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.0/using_doctr/sharing_models.html
+++ b/v0.1.0/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.0/using_doctr/using_contrib_modules.html b/v0.1.0/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.0/using_doctr/using_contrib_modules.html
+++ b/v0.1.0/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.0/using_doctr/using_datasets.html b/v0.1.0/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.0/using_doctr/using_datasets.html
+++ b/v0.1.0/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.0/using_doctr/using_model_export.html b/v0.1.0/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.0/using_doctr/using_model_export.html
+++ b/v0.1.0/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.0/using_doctr/using_models.html b/v0.1.0/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.0/using_doctr/using_models.html
+++ b/v0.1.0/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/cord.html b/v0.1.1/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.1/_modules/doctr/datasets/cord.html
+++ b/v0.1.1/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/detection.html b/v0.1.1/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.1/_modules/doctr/datasets/detection.html
+++ b/v0.1.1/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/funsd.html b/v0.1.1/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.1/_modules/doctr/datasets/funsd.html
+++ b/v0.1.1/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic03.html b/v0.1.1/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.1/_modules/doctr/datasets/ic03.html
+++ b/v0.1.1/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic13.html b/v0.1.1/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.1/_modules/doctr/datasets/ic13.html
+++ b/v0.1.1/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiit5k.html b/v0.1.1/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.1/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.1/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiithws.html b/v0.1.1/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.1/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.1/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/imgur5k.html b/v0.1.1/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.1/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.1/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/loader.html b/v0.1.1/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.1/_modules/doctr/datasets/loader.html
+++ b/v0.1.1/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/mjsynth.html b/v0.1.1/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.1/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.1/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ocr.html b/v0.1.1/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.1/_modules/doctr/datasets/ocr.html
+++ b/v0.1.1/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/recognition.html b/v0.1.1/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.1/_modules/doctr/datasets/recognition.html
+++ b/v0.1.1/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/sroie.html b/v0.1.1/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.1/_modules/doctr/datasets/sroie.html
+++ b/v0.1.1/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svhn.html b/v0.1.1/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.1/_modules/doctr/datasets/svhn.html
+++ b/v0.1.1/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svt.html b/v0.1.1/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.1/_modules/doctr/datasets/svt.html
+++ b/v0.1.1/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/synthtext.html b/v0.1.1/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.1/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.1/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/utils.html b/v0.1.1/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.1/_modules/doctr/datasets/utils.html
+++ b/v0.1.1/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/wildreceipt.html b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.1/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.1/_modules/doctr/io/elements.html b/v0.1.1/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.1/_modules/doctr/io/elements.html
+++ b/v0.1.1/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.1/_modules/doctr/io/html.html b/v0.1.1/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.1/_modules/doctr/io/html.html
+++ b/v0.1.1/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/base.html b/v0.1.1/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.1/_modules/doctr/io/image/base.html
+++ b/v0.1.1/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/tensorflow.html b/v0.1.1/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.1/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.1/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/io/pdf.html b/v0.1.1/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.1/_modules/doctr/io/pdf.html
+++ b/v0.1.1/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.1/_modules/doctr/io/reader.html b/v0.1.1/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.1/_modules/doctr/io/reader.html
+++ b/v0.1.1/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/zoo.html b/v0.1.1/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.1/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.1/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/zoo.html b/v0.1.1/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.1/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.1/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/factory/hub.html b/v0.1.1/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.1/_modules/doctr/models/factory/hub.html
+++ b/v0.1.1/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/zoo.html b/v0.1.1/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.1/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.1/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/zoo.html b/v0.1.1/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.1/_modules/doctr/models/zoo.html
+++ b/v0.1.1/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/base.html b/v0.1.1/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/utils/metrics.html b/v0.1.1/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.1/_modules/doctr/utils/metrics.html
+++ b/v0.1.1/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.1/_modules/doctr/utils/visualization.html b/v0.1.1/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.1/_modules/doctr/utils/visualization.html
+++ b/v0.1.1/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.1/_modules/index.html b/v0.1.1/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.1/_modules/index.html
+++ b/v0.1.1/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.1/_sources/getting_started/installing.rst.txt b/v0.1.1/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.1/_sources/getting_started/installing.rst.txt
+++ b/v0.1.1/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.1/_static/basic.css b/v0.1.1/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.1/_static/basic.css
+++ b/v0.1.1/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.1/_static/doctools.js b/v0.1.1/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.1/_static/doctools.js
+++ b/v0.1.1/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.1/_static/language_data.js b/v0.1.1/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.1/_static/language_data.js
+++ b/v0.1.1/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.1/_static/searchtools.js b/v0.1.1/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.1/_static/searchtools.js
+++ b/v0.1.1/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.1/changelog.html b/v0.1.1/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.1/changelog.html
+++ b/v0.1.1/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.1/community/resources.html b/v0.1.1/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.1/community/resources.html
+++ b/v0.1.1/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.1/contributing/code_of_conduct.html b/v0.1.1/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.1/contributing/code_of_conduct.html
+++ b/v0.1.1/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.1/contributing/contributing.html b/v0.1.1/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.1/contributing/contributing.html
+++ b/v0.1.1/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.1/genindex.html b/v0.1.1/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.1/genindex.html
+++ b/v0.1.1/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.1/getting_started/installing.html b/v0.1.1/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.1/getting_started/installing.html
+++ b/v0.1.1/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.1/index.html b/v0.1.1/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.1/index.html
+++ b/v0.1.1/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.1/modules/contrib.html b/v0.1.1/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.1/modules/contrib.html
+++ b/v0.1.1/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.1/modules/datasets.html b/v0.1.1/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.1/modules/datasets.html
+++ b/v0.1.1/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.1/modules/io.html b/v0.1.1/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.1/modules/io.html
+++ b/v0.1.1/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.1/modules/models.html b/v0.1.1/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.1/modules/models.html
+++ b/v0.1.1/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.1/modules/transforms.html b/v0.1.1/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.1/modules/transforms.html
+++ b/v0.1.1/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
Args:¶<
-
+
diff --git a/modules/utils.html b/modules/utils.html
index 3dd3ecbd96..6784d81f6f 100644
--- a/modules/utils.html
+++ b/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -711,7 +711,7 @@ Args:¶
-
+
diff --git a/notebooks.html b/notebooks.html
index f3ea994e49..647f73d4eb 100644
--- a/notebooks.html
+++ b/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -387,7 +387,7 @@ docTR Notebooks
-
+
diff --git a/search.html b/search.html
index f0693e2c97..0e0da5efb3 100644
--- a/search.html
+++ b/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -336,7 +336,7 @@
-
+
diff --git a/searchindex.js b/searchindex.js
index 8598997441..df18967072 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[1, "correction"]], "2. Warning": [[1, "warning"]], "3. Temporary Ban": [[1, "temporary-ban"]], "4. Permanent Ban": [[1, "permanent-ban"]], "AWS Lambda": [[13, null]], "Advanced options": [[18, "advanced-options"]], "Args:": [[6, "args"], [6, "id4"], [6, "id7"], [6, "id10"], [6, "id13"], [6, "id16"], [6, "id19"], [6, "id22"], [6, "id25"], [6, "id29"], [6, "id32"], [6, "id37"], [6, "id40"], [6, "id46"], [6, "id49"], [6, "id50"], [6, "id51"], [6, "id54"], [6, "id57"], [6, "id60"], [6, "id61"], [7, "args"], [7, "id2"], [7, "id3"], [7, "id4"], [7, "id5"], [7, "id6"], [7, "id7"], [7, "id10"], [7, "id12"], [7, "id14"], [7, "id16"], [7, "id20"], [7, "id24"], [7, "id28"], [8, "args"], [8, "id3"], [8, "id8"], [8, "id13"], [8, "id17"], [8, "id21"], [8, "id26"], [8, "id31"], [8, "id36"], [8, "id41"], [8, "id46"], [8, "id50"], [8, "id54"], [8, "id59"], [8, "id63"], [8, "id68"], [8, "id73"], [8, "id77"], [8, "id81"], [8, "id85"], [8, "id90"], [8, "id95"], [8, "id99"], [8, "id104"], [8, "id109"], [8, "id114"], [8, "id119"], [8, "id123"], [8, "id127"], [8, "id132"], [8, "id137"], [8, "id142"], [8, "id146"], [8, "id150"], [8, "id155"], [8, "id159"], [8, "id163"], [8, "id167"], [8, "id169"], [8, "id171"], [8, "id173"], [9, "args"], [9, "id1"], [9, "id2"], [9, "id3"], [9, "id4"], [9, "id5"], [9, "id6"], [9, "id7"], [9, "id8"], [9, "id9"], [9, "id10"], [9, "id11"], [9, "id12"], [9, "id13"], [9, "id14"], [9, "id15"], [9, "id16"], [9, "id17"], [9, "id18"], [9, "id19"], [10, "args"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"]], "Artefact": [[7, "artefact"]], "ArtefactDetection": [[15, "artefactdetection"]], "Attribution": [[1, "attribution"]], "Available Datasets": [[16, "available-datasets"]], "Available architectures": [[18, "available-architectures"], [18, "id1"], [18, "id2"]], "Available contribution modules": [[15, "available-contribution-modules"]], "Block": [[7, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[16, null]], "Choosing the right model": [[18, null]], "Classification": [[14, "classification"]], "Code quality": [[2, "code-quality"]], "Code style verification": [[2, "code-style-verification"]], "Codebase structure": [[2, "codebase-structure"]], "Commits": [[2, "commits"]], "Composing transformations": [[9, "composing-transformations"]], "Continuous Integration": [[2, "continuous-integration"]], "Contributing to docTR": [[2, null]], "Contributor Covenant Code of Conduct": [[1, null]], "Custom dataset loader": [[6, "custom-dataset-loader"]], "Custom orientation classification models": [[12, "custom-orientation-classification-models"]], "Data Loading": [[16, "data-loading"]], "Dataloader": [[6, "dataloader"]], "Detection": [[14, "detection"], [16, "detection"]], "Detection predictors": [[18, "detection-predictors"]], "Developer mode installation": [[2, "developer-mode-installation"]], "Developing docTR": [[2, "developing-doctr"]], "Document": [[7, "document"]], "Document structure": [[7, "document-structure"]], "End-to-End OCR": [[18, "end-to-end-ocr"]], "Enforcement": [[1, "enforcement"]], "Enforcement Guidelines": [[1, "enforcement-guidelines"]], "Enforcement Responsibilities": [[1, "enforcement-responsibilities"]], "Export to ONNX": [[17, "export-to-onnx"]], "Feature requests & bug report": [[2, "feature-requests-bug-report"]], "Feedback": [[2, "feedback"]], "File reading": [[7, "file-reading"]], "Half-precision": [[17, "half-precision"]], "Installation": [[3, null]], "Integrate contributions into your pipeline": [[15, null]], "Let\u2019s connect": [[2, "let-s-connect"]], "Line": [[7, "line"]], "Loading from Huggingface Hub": [[14, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[12, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[12, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[4, "main-features"]], "Model optimization": [[17, "model-optimization"]], "Model zoo": [[4, "model-zoo"]], "Modifying the documentation": [[2, "modifying-the-documentation"]], "Naming conventions": [[14, "naming-conventions"]], "OCR": [[16, "ocr"]], "Object Detection": [[16, "object-detection"]], "Our Pledge": [[1, "our-pledge"]], "Our Standards": [[1, "our-standards"]], "Page": [[7, "page"]], "Preparing your model for inference": [[17, null]], "Prerequisites": [[3, "prerequisites"]], "Pretrained community models": [[14, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[14, "pushing-to-the-huggingface-hub"]], "Questions": [[2, "questions"]], "Recognition": [[14, "recognition"], [16, "recognition"]], "Recognition predictors": [[18, "recognition-predictors"]], "Returns:": [[6, "returns"], [7, "returns"], [7, "id11"], [7, "id13"], [7, "id15"], [7, "id19"], [7, "id23"], [7, "id27"], [7, "id31"], [8, "returns"], [8, "id6"], [8, "id11"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id29"], [8, "id34"], [8, "id39"], [8, "id44"], [8, "id49"], [8, "id53"], [8, "id57"], [8, "id62"], [8, "id66"], [8, "id71"], [8, "id76"], [8, "id80"], [8, "id84"], [8, "id88"], [8, "id93"], [8, "id98"], [8, "id102"], [8, "id107"], [8, "id112"], [8, "id117"], [8, "id122"], [8, "id126"], [8, "id130"], [8, "id135"], [8, "id140"], [8, "id145"], [8, "id149"], [8, "id153"], [8, "id158"], [8, "id162"], [8, "id166"], [8, "id168"], [8, "id170"], [8, "id172"], [10, "returns"]], "Scope": [[1, "scope"]], "Share your model with the community": [[14, null]], "Supported Vocabs": [[6, "supported-vocabs"]], "Supported contribution modules": [[5, "supported-contribution-modules"]], "Supported datasets": [[4, "supported-datasets"]], "Supported transformations": [[9, "supported-transformations"]], "Synthetic dataset generator": [[6, "synthetic-dataset-generator"], [16, "synthetic-dataset-generator"]], "Task evaluation": [[10, "task-evaluation"]], "Text Detection": [[18, "text-detection"]], "Text Recognition": [[18, "text-recognition"]], "Text detection models": [[4, "text-detection-models"]], "Text recognition models": [[4, "text-recognition-models"]], "Train your own model": [[12, null]], "Two-stage approaches": [[18, "two-stage-approaches"]], "Unit tests": [[2, "unit-tests"]], "Use your own datasets": [[16, "use-your-own-datasets"]], "Using your ONNX exported model": [[17, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[3, "via-conda-only-for-linux"]], "Via Git": [[3, "via-git"]], "Via Python Package": [[3, "via-python-package"]], "Visualization": [[10, "visualization"]], "What should I do with the output?": [[18, "what-should-i-do-with-the-output"]], "Word": [[7, "word"]], "docTR Notebooks": [[11, null]], "docTR Vocabs": [[6, "id62"]], "docTR: Document Text Recognition": [[4, null]], "doctr.contrib": [[5, null]], "doctr.datasets": [[6, null], [6, "datasets"]], "doctr.io": [[7, null]], "doctr.models": [[8, null]], "doctr.models.classification": [[8, "doctr-models-classification"]], "doctr.models.detection": [[8, "doctr-models-detection"]], "doctr.models.factory": [[8, "doctr-models-factory"]], "doctr.models.recognition": [[8, "doctr-models-recognition"]], "doctr.models.zoo": [[8, "doctr-models-zoo"]], "doctr.transforms": [[9, null]], "doctr.utils": [[10, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[7, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[7, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[9, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[6, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[9, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[9, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[6, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[6, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[8, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[6, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[6, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[7, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[7, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[6, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[6, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[9, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[9, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[6, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[6, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[6, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[6, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[6, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[8, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[9, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[7, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[6, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[9, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[8, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[6, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[9, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[7, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[9, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[9, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[9, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[9, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[9, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[9, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[9, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[9, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[9, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[9, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[9, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[9, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[7, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[7, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[7, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[6, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[9, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[7, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[7, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[6, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[6, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[6, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[6, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[9, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[10, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[6, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[7, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[6, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[6, 0, 1, "", "CORD"], [6, 0, 1, "", "CharacterGenerator"], [6, 0, 1, "", "DetectionDataset"], [6, 0, 1, "", "DocArtefacts"], [6, 0, 1, "", "FUNSD"], [6, 0, 1, "", "IC03"], [6, 0, 1, "", "IC13"], [6, 0, 1, "", "IIIT5K"], [6, 0, 1, "", "IIITHWS"], [6, 0, 1, "", "IMGUR5K"], [6, 0, 1, "", "MJSynth"], [6, 0, 1, "", "OCRDataset"], [6, 0, 1, "", "RecognitionDataset"], [6, 0, 1, "", "SROIE"], [6, 0, 1, "", "SVHN"], [6, 0, 1, "", "SVT"], [6, 0, 1, "", "SynthText"], [6, 0, 1, "", "WILDRECEIPT"], [6, 0, 1, "", "WordGenerator"], [6, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[6, 0, 1, "", "DataLoader"]], "doctr.io": [[7, 0, 1, "", "Artefact"], [7, 0, 1, "", "Block"], [7, 0, 1, "", "Document"], [7, 0, 1, "", "DocumentFile"], [7, 0, 1, "", "Line"], [7, 0, 1, "", "Page"], [7, 0, 1, "", "Word"], [7, 1, 1, "", "decode_img_as_tensor"], [7, 1, 1, "", "read_html"], [7, 1, 1, "", "read_img_as_numpy"], [7, 1, 1, "", "read_img_as_tensor"], [7, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[7, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[7, 2, 1, "", "from_images"], [7, 2, 1, "", "from_pdf"], [7, 2, 1, "", "from_url"]], "doctr.io.Page": [[7, 2, 1, "", "show"]], "doctr.models": [[8, 1, 1, "", "kie_predictor"], [8, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[8, 1, 1, "", "crop_orientation_predictor"], [8, 1, 1, "", "magc_resnet31"], [8, 1, 1, "", "mobilenet_v3_large"], [8, 1, 1, "", "mobilenet_v3_large_r"], [8, 1, 1, "", "mobilenet_v3_small"], [8, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [8, 1, 1, "", "mobilenet_v3_small_page_orientation"], [8, 1, 1, "", "mobilenet_v3_small_r"], [8, 1, 1, "", "page_orientation_predictor"], [8, 1, 1, "", "resnet18"], [8, 1, 1, "", "resnet31"], [8, 1, 1, "", "resnet34"], [8, 1, 1, "", "resnet50"], [8, 1, 1, "", "textnet_base"], [8, 1, 1, "", "textnet_small"], [8, 1, 1, "", "textnet_tiny"], [8, 1, 1, "", "vgg16_bn_r"], [8, 1, 1, "", "vit_b"], [8, 1, 1, "", "vit_s"]], "doctr.models.detection": [[8, 1, 1, "", "db_mobilenet_v3_large"], [8, 1, 1, "", "db_resnet50"], [8, 1, 1, "", "detection_predictor"], [8, 1, 1, "", "fast_base"], [8, 1, 1, "", "fast_small"], [8, 1, 1, "", "fast_tiny"], [8, 1, 1, "", "linknet_resnet18"], [8, 1, 1, "", "linknet_resnet34"], [8, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[8, 1, 1, "", "from_hub"], [8, 1, 1, "", "login_to_hub"], [8, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[8, 1, 1, "", "crnn_mobilenet_v3_large"], [8, 1, 1, "", "crnn_mobilenet_v3_small"], [8, 1, 1, "", "crnn_vgg16_bn"], [8, 1, 1, "", "master"], [8, 1, 1, "", "parseq"], [8, 1, 1, "", "recognition_predictor"], [8, 1, 1, "", "sar_resnet31"], [8, 1, 1, "", "vitstr_base"], [8, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[9, 0, 1, "", "ChannelShuffle"], [9, 0, 1, "", "ColorInversion"], [9, 0, 1, "", "Compose"], [9, 0, 1, "", "GaussianBlur"], [9, 0, 1, "", "GaussianNoise"], [9, 0, 1, "", "LambdaTransformation"], [9, 0, 1, "", "Normalize"], [9, 0, 1, "", "OneOf"], [9, 0, 1, "", "RandomApply"], [9, 0, 1, "", "RandomBrightness"], [9, 0, 1, "", "RandomContrast"], [9, 0, 1, "", "RandomCrop"], [9, 0, 1, "", "RandomGamma"], [9, 0, 1, "", "RandomHorizontalFlip"], [9, 0, 1, "", "RandomHue"], [9, 0, 1, "", "RandomJpegQuality"], [9, 0, 1, "", "RandomResize"], [9, 0, 1, "", "RandomRotate"], [9, 0, 1, "", "RandomSaturation"], [9, 0, 1, "", "RandomShadow"], [9, 0, 1, "", "Resize"], [9, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[10, 0, 1, "", "DetectionMetric"], [10, 0, 1, "", "LocalizationConfusion"], [10, 0, 1, "", "OCRMetric"], [10, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.visualization": [[10, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [1, 7, 8, 10, 14, 17], "0": [1, 3, 6, 9, 10, 12, 15, 16, 18], "00": 18, "01": 18, "0123456789": 6, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "02562": 8, "03": 18, "035": 18, "0361328125": 18, "04": 18, "05": 18, "06": 18, "06640625": 18, "07": 18, "08": [9, 18], "09": 18, "0966796875": 18, "1": [6, 7, 8, 9, 10, 12, 16, 18], "10": [6, 10, 18], "100": [6, 9, 10, 16, 18], "1000": 18, "101": 6, "1024": [8, 12, 18], "104": 6, "106": 6, "108": 6, "1095": 16, "11": 18, "110": 10, "1107": 16, "114": 6, "115": 6, "1156": 16, "116": 6, "118": 6, "11800h": 18, "11th": 18, "12": 18, "120": 6, "123": 6, "126": 6, "1268": 16, "128": [8, 12, 17, 18], "13": 18, "130": 6, "13068": 16, "131": 6, "1337891": 16, "1357421875": 18, "1396484375": 18, "14": 18, "1420": 18, "14470v1": 6, "149": 16, "15": 18, "150": [10, 18], "1552": 18, "16": [8, 17, 18], "1630859375": 18, "1684": 18, "16x16": 8, "17": 18, "1778": 18, "1782": 18, "18": [8, 18], "185546875": 18, "1900": 18, "1910": 8, "19342": 16, "19370": 16, "195": 6, "19598": 16, "199": 18, "1999": 18, "2": [3, 4, 6, 7, 9, 15, 18], "20": 18, "200": 10, "2000": 16, "2003": [4, 6], "2012": 6, "2013": [4, 6], "2015": 6, "2019": 4, "207901": 16, "21": 18, "2103": 6, "2186": 16, "21888": 16, "22": 18, "224": [8, 9], "225": 9, "22672": 16, "229": [9, 16], "23": 18, "233": 16, "236": 6, "24": 18, "246": 16, "249": 16, "25": 18, "2504": 18, "255": [7, 8, 9, 10, 18], "256": 8, "257": 16, "26": 18, "26032": 16, "264": 12, "27": 18, "2700": 16, "2710": 18, "2749": 12, "28": 18, "287": 12, "29": 18, "296": 12, "299": 12, "2d": 18, "3": [3, 4, 7, 8, 9, 10, 17, 18], "30": 18, "300": 16, "3000": 16, "301": 12, "30595": 18, "30ghz": 18, "31": 8, "32": [6, 8, 9, 12, 16, 17, 18], "3232421875": 18, "33": [9, 18], "33402": 16, "33608": 16, "34": [8, 18], "340": 18, "3456": 18, "3515625": 18, "36": 18, "360": 16, "37": [6, 18], "38": 18, "39": 18, "4": [8, 9, 10, 18], "40": 18, "406": 9, "41": 18, "42": 18, "43": 18, "44": 18, "45": 18, "456": 9, "46": 18, "47": 18, "472": 16, "48": [6, 18], "485": 9, "49": 18, "49377": 16, "5": [6, 9, 10, 15, 18], "50": [8, 16, 18], "51": 18, "51171875": 18, "512": 8, "52": [6, 18], "529": 18, "53": 18, "54": 18, "540": 18, "5478515625": 18, "55": 18, "56": 18, "57": 18, "58": [6, 18], "580": 18, "5810546875": 18, "583": 18, "59": 18, "597": 18, "5k": [4, 6], "5m": 18, "6": [9, 18], "60": 9, "600": [8, 10, 18], "61": 18, "62": 18, "626": 16, "63": 18, "64": [8, 9, 18], "641": 18, "647": 16, "65": 18, "66": 18, "67": 18, "68": 18, "69": 18, "693": 12, "694": 12, "695": 12, "6m": 18, "7": 18, "70": [6, 10, 18], "707470": 16, "71": [6, 18], "7100000": 16, "7141797": 16, "7149": 16, "72": 18, "72dpi": 7, "73": 18, "73257": 16, "74": 18, "75": [9, 18], "7581382": 16, "76": 18, "77": 18, "772": 12, "772875": 16, "78": 18, "785": 12, "79": 18, "793533": 16, "796": 16, "798": 12, "7m": 18, "8": [8, 9, 18], "80": 18, "800": [8, 10, 16, 18], "81": 18, "82": 18, "83": 18, "84": 18, "849": 16, "85": 18, "8564453125": 18, "857": 18, "85875": 16, "86": 18, "8603515625": 18, "87": 18, "8707": 16, "88": 18, "89": 18, "9": [3, 9, 18], "90": 18, "90k": 6, "90kdict32px": 6, "91": 18, "914085328578949": 18, "92": 18, "93": 18, "94": [6, 18], "95": [10, 18], "9578408598899841": 18, "96": 18, "97": 18, "98": 18, "99": 18, "9949972033500671": 18, "A": [1, 2, 4, 6, 7, 8, 11, 17], "As": 2, "Be": 18, "Being": 1, "By": 13, "For": [1, 2, 3, 12, 18], "If": [2, 7, 8, 12, 18], "In": [2, 6, 16], "It": [9, 14, 15, 17], "Its": [4, 8], "No": [1, 18], "Of": 6, "Or": [15, 17], "The": [1, 2, 6, 7, 10, 13, 15, 16, 17, 18], "Then": 8, "To": [2, 3, 13, 14, 15, 17, 18], "_": [1, 6, 8], "__call__": 18, "_build": 2, "_i": 10, "ab": 6, "abc": 17, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "abdef": [6, 16], "abl": [16, 18], "about": [1, 16, 18], "abov": 18, "abstractdataset": 6, "abus": 1, "accept": 1, "access": [4, 7, 16, 18], "account": [1, 14], "accur": 18, "accuraci": 10, "achiev": 17, "act": 1, "action": 1, "activ": 4, "ad": [2, 8, 9], "adapt": 1, "add": [9, 10, 14, 18], "add_hook": 18, "add_label": 10, "addit": [2, 3, 7, 15, 18], "addition": [2, 18], "address": [1, 7], "adjust": 9, "advanc": 1, "advantag": 17, "advis": 2, "aesthet": [4, 6], "affect": 1, "after": [14, 18], "ag": 1, "again": 8, "aggreg": [10, 16], "aggress": 1, "align": [1, 7, 9], "all": [1, 2, 5, 6, 7, 9, 10, 15, 16, 18], "allow": [1, 17], "along": 18, "alreadi": [2, 17], "also": [1, 8, 14, 15, 16, 18], "alwai": 16, "an": [1, 2, 4, 6, 7, 8, 10, 15, 17, 18], "analysi": [7, 15], "ancient_greek": 6, "angl": [7, 9], "ani": [1, 6, 7, 8, 9, 10, 17, 18], "annot": 6, "anot": 16, "anoth": [8, 12, 16], "answer": 1, "anyascii": 10, "anyon": 4, "anyth": 15, "api": [2, 4], "apolog": 1, "apologi": 1, "app": 2, "appear": 1, "appli": [1, 6, 9], "applic": [4, 8], "appoint": 1, "appreci": 14, "appropri": [1, 2, 18], "ar": [1, 2, 3, 5, 6, 7, 9, 10, 11, 15, 16, 18], "arab": 6, "arabic_diacrit": 6, "arabic_lett": 6, "arabic_punctu": 6, "arbitrarili": [4, 8], "arch": [8, 14], "architectur": [4, 8, 14, 15], "area": 18, "argument": [6, 7, 8, 10, 12, 18], "around": 1, "arrai": [7, 9, 10], "art": [4, 15], "artefact": [10, 15, 18], "artefact_typ": 7, "artifici": [4, 6], "arxiv": [6, 8], "asarrai": 10, "ascii_lett": 6, "aspect": [4, 8, 9, 18], "assess": 10, "assign": 10, "associ": 7, "assum": 8, "assume_straight_pag": [8, 12, 18], "astyp": [8, 10, 18], "attack": 1, "attend": [4, 8], "attent": [1, 8], "autom": 4, "automat": 18, "autoregress": [4, 8], "avail": [1, 4, 5, 9], "averag": [9, 18], "avoid": [1, 3], "aw": [4, 18], "awar": 18, "azur": 18, "b": [8, 10, 18], "b_j": 10, "back": 2, "backbon": 8, "backend": 18, "background": 16, "bangla": 6, "bar": 15, "bar_cod": 16, "base": [4, 8, 15], "baselin": [4, 8, 18], "batch": [6, 8, 9, 15, 16, 18], "batch_siz": [6, 12, 15, 16, 17], "bblanchon": 3, "bbox": 18, "becaus": 13, "been": [2, 10, 16, 18], "befor": [6, 8, 9, 18], "begin": 10, "behavior": [1, 18], "being": [10, 18], "belong": 18, "benchmark": 18, "best": 1, "better": [11, 18], "between": [9, 10, 18], "bgr": 7, "bilinear": 9, "bin_thresh": 18, "binar": [4, 8, 18], "binari": [7, 17, 18], "bit": 17, "block": [10, 18], "block_1_1": 18, "blur": 9, "bmvc": 6, "bn": 14, "bodi": [1, 18], "bool": [6, 7, 8, 9, 10], "boolean": [8, 18], "both": [4, 6, 9, 16, 18], "bottom": [8, 18], "bound": [6, 7, 8, 9, 10, 15, 16, 18], "box": [6, 7, 8, 9, 10, 15, 16, 18], "box_thresh": 18, "bright": 9, "browser": [2, 4], "build": [2, 3, 17], "built": 2, "byte": [7, 18], "c": [3, 7, 10], "c_j": 10, "cach": [2, 6, 13], "cache_sampl": 6, "call": 17, "callabl": [6, 9], "can": [2, 3, 12, 13, 14, 15, 16, 18], "capabl": [2, 11, 18], "case": [6, 10], "cf": 18, "cfg": 18, "challeng": 6, "challenge2_test_task12_imag": 6, "challenge2_test_task1_gt": 6, "challenge2_training_task12_imag": 6, "challenge2_training_task1_gt": 6, "chang": [13, 18], "channel": [1, 2, 7, 9], "channel_prior": 3, "channelshuffl": 9, "charact": [4, 6, 7, 10, 16, 18], "charactergener": [6, 16], "characterist": 1, "charg": 18, "charset": 18, "chart": 7, "check": [2, 14, 18], "checkpoint": 8, "chip": 3, "ci": 2, "clarifi": 1, "clariti": 1, "class": [1, 6, 7, 9, 10, 18], "class_nam": 12, "classif": [16, 18], "classmethod": 7, "clear": 2, "clone": 3, "close": 2, "co": 14, "code": [4, 7, 15], "codecov": 2, "colab": 11, "collate_fn": 6, "collect": [7, 15], "color": 9, "colorinvers": 9, "column": 7, "com": [1, 3, 7, 8, 14], "combin": 18, "command": [2, 15], "comment": 1, "commit": 1, "common": [1, 9, 10, 17], "commun": 1, "compar": 4, "comparison": [10, 18], "competit": 6, "compil": [11, 18], "complaint": 1, "complementari": 10, "complet": 2, "compon": 18, "compos": [6, 18], "comprehens": 18, "comput": [6, 10, 17, 18], "conf_threshold": 15, "confid": [7, 18], "config": [3, 8], "configur": 8, "confus": 10, "consecut": [9, 18], "consequ": 1, "consid": [1, 2, 6, 7, 10, 18], "consist": 18, "consolid": [4, 6], "constant": 9, "construct": 1, "contact": 1, "contain": [5, 6, 11, 16, 18], "content": [6, 7, 18], "context": 8, "contib": 3, "continu": 1, "contrast": 9, "contrast_factor": 9, "contrib": [3, 15], "contribut": 1, "contributor": 2, "convers": 7, "convert": [7, 9], "convolut": 8, "coordin": [7, 18], "cord": [4, 6, 16, 18], "core": [10, 18], "corner": 18, "correct": 9, "correspond": [3, 7, 9, 18], "could": [1, 15], "counterpart": 10, "cover": 2, "coverag": 2, "cpu": [4, 12, 17], "creat": 14, "crnn": [4, 8, 14], "crnn_mobilenet_v3_larg": [8, 14, 18], "crnn_mobilenet_v3_smal": [8, 17, 18], "crnn_vgg16_bn": [8, 12, 14, 18], "crop": [7, 8, 9, 12, 16, 18], "crop_orient": [7, 18], "crop_orientation_predictor": [8, 12], "crop_param": 12, "cuda": 17, "currenc": 6, "current": [2, 12, 18], "custom": [14, 15, 17, 18], "custom_crop_orientation_model": 12, "custom_page_orientation_model": 12, "customhook": 18, "cvit": 4, "czczup": 8, "czech": 6, "d": [6, 16], "danish": 6, "data": [4, 6, 7, 9, 10, 12, 14], "dataload": 16, "dataset": [8, 12, 18], "dataset_info": 6, "date": [12, 18], "db": 14, "db_mobilenet_v3_larg": [8, 14, 18], "db_resnet34": 18, "db_resnet50": [8, 12, 14, 18], "dbnet": [4, 8], "deal": [11, 18], "decis": 1, "decod": 7, "decode_img_as_tensor": 7, "dedic": 17, "deem": 1, "deep": [8, 18], "def": 18, "default": [3, 7, 12, 13, 18], "defer": 16, "defin": [10, 17], "degre": [7, 9, 18], "degress": 7, "delet": 2, "delimit": 18, "delta": 9, "demo": [2, 4], "demonstr": 1, "depend": [2, 3, 4, 18], "deploi": 2, "deploy": 4, "derogatori": 1, "describ": 8, "descript": 11, "design": 9, "desir": 7, "det_arch": [8, 12, 14, 17], "det_b": 18, "det_model": [12, 14, 17], "det_param": 12, "det_predictor": [12, 18], "detail": [12, 18], "detect": [6, 7, 10, 11, 12, 15], "detect_languag": 8, "detect_orient": [8, 12, 18], "detection_predictor": [8, 18], "detection_task": [6, 16], "detectiondataset": [6, 16], "detectionmetr": 10, "detectionpredictor": [8, 12], "detector": [4, 8, 15], "deterior": 8, "determin": 1, "dev": [2, 13], "develop": 3, "deviat": 9, "devic": 17, "dict": [7, 10, 18], "dictionari": [7, 10], "differ": 1, "differenti": [4, 8], "digit": [4, 6, 16], "dimens": [7, 10, 18], "dimension": 9, "direct": 6, "directli": [14, 18], "directori": [2, 13], "disabl": [1, 13, 18], "disable_crop_orient": 18, "disable_page_orient": 18, "disclaim": 18, "discuss": 2, "disparag": 1, "displai": [7, 10], "display_artefact": 10, "distribut": 9, "div": 18, "divers": 1, "divid": 7, "do": [2, 3, 8], "doc": [2, 7, 15, 17, 18], "docartefact": [6, 16], "docstr": 2, "doctr": [3, 12, 13, 14, 15, 16, 17, 18], "doctr_cache_dir": 13, "doctr_multiprocessing_dis": 13, "document": [6, 8, 10, 11, 12, 15, 16, 17, 18], "documentbuild": 18, "documentfil": [7, 12, 14, 15, 17], "doesn": 17, "don": [12, 18], "done": 9, "download": [6, 16], "downsiz": 8, "draw": 9, "drop": 6, "drop_last": 6, "dtype": [7, 8, 9, 10, 17], "dual": [4, 6], "dummi": 14, "dummy_img": 18, "dummy_input": 17, "dure": 1, "dutch": 6, "dynam": [6, 15], "dynamic_seq_length": 6, "e": [1, 2, 3, 7, 8], "each": [4, 6, 7, 8, 9, 10, 16, 18], "eas": 2, "easi": [4, 10, 14, 17], "easili": [7, 10, 12, 14, 16, 18], "econom": 1, "edit": 1, "educ": 1, "effect": 18, "effici": [2, 4, 6, 8], "either": [10, 18], "element": [6, 7, 8, 18], "els": [2, 15], "email": 1, "empathi": 1, "en": 18, "enabl": [6, 7], "enclos": 7, "encod": [4, 6, 7, 8, 18], "encode_sequ": 6, "encount": 2, "encrypt": 7, "end": [4, 6, 8, 10], "english": [6, 16], "enough": [2, 18], "ensur": 2, "entri": 6, "environ": [1, 13], "eo": 6, "equiv": 18, "estim": 8, "etc": [7, 15], "ethnic": 1, "evalu": [16, 18], "event": 1, "everyon": 1, "everyth": [2, 18], "exact": [10, 18], "exampl": [1, 2, 4, 6, 8, 14, 18], "exchang": 17, "execut": 18, "exist": 14, "expand": 9, "expect": [7, 9, 10], "experi": 1, "explan": [1, 18], "explicit": 1, "exploit": [4, 8], "export": [7, 8, 10, 11, 15, 18], "export_as_straight_box": [8, 18], "export_as_xml": 18, "export_model_to_onnx": 17, "express": [1, 9], "extens": 7, "extern": [1, 16], "extract": [4, 6], "extractor": 8, "f_": 10, "f_a": 10, "factor": 9, "fair": 1, "fairli": 1, "fals": [6, 7, 8, 9, 10, 12, 18], "faq": 1, "fascan": 14, "fast": [4, 6, 8], "fast_bas": [8, 18], "fast_smal": [8, 18], "fast_tini": [8, 18], "faster": [4, 8, 17], "fasterrcnn_mobilenet_v3_large_fpn": 8, "favorit": 18, "featur": [3, 8, 10, 11, 12, 15], "feedback": 1, "feel": [2, 14], "felix92": 14, "few": [17, 18], "figsiz": 10, "figur": [10, 15], "file": [2, 6], "final": 8, "find": [2, 16], "finnish": 6, "first": [2, 6], "firsthand": 6, "fit": [8, 18], "flag": 18, "flip": 9, "float": [7, 9, 10, 17], "float32": [7, 8, 9, 17], "fn": 9, "focu": 14, "focus": [1, 6], "folder": 6, "follow": [1, 2, 3, 6, 9, 10, 12, 13, 14, 15, 18], "font": 6, "font_famili": 6, "foral": 10, "forc": 2, "forg": 3, "form": [4, 6, 18], "format": [7, 10, 12, 16, 17, 18], "forpost": [4, 6], "forum": 2, "fp16": 17, "frac": 10, "framework": [3, 14, 16, 18], "free": [1, 2, 14], "french": [6, 12, 14, 18], "friendli": 4, "from": [1, 4, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18], "from_hub": [8, 14], "from_imag": [7, 14, 15, 17], "from_pdf": 7, "from_url": 7, "full": [6, 10, 18], "function": [6, 9, 10, 15], "funsd": [4, 6, 16, 18], "further": 16, "futur": 6, "g": [7, 8], "g_": 10, "g_x": 10, "gamma": 9, "gaussian": 9, "gaussianblur": 9, "gaussiannois": 9, "gen": 18, "gender": 1, "gener": [2, 4, 7, 8], "generic_cyrillic_lett": 6, "geometri": [4, 7, 18], "geq": 10, "german": [6, 12, 14], "get": [17, 18], "git": 14, "github": [2, 3, 8, 14], "give": [1, 15], "given": [6, 7, 9, 10, 18], "global": 8, "go": 18, "good": 17, "googl": 2, "googlevis": 4, "gpu": [4, 15, 17], "gracefulli": 1, "graph": [4, 6, 7], "grayscal": 9, "ground": 10, "groung": 10, "group": [4, 18], "gt": 10, "gt_box": 10, "gt_label": 10, "guid": 2, "guidanc": 16, "gvision": 18, "h": [7, 8, 9], "h_": 10, "ha": [2, 6, 10, 16], "handl": [11, 16, 18], "handwrit": 6, "handwritten": 16, "harass": 1, "hardwar": 18, "harm": 1, "hat": 10, "have": [1, 2, 10, 12, 14, 16, 17, 18], "head": [8, 18], "healthi": 1, "hebrew": 6, "height": [7, 9], "hello": [10, 18], "help": 17, "here": [5, 9, 11, 15, 16, 18], "hf": 8, "hf_hub_download": 8, "high": 7, "higher": [3, 6, 18], "hindi": 6, "hindi_digit": 6, "hocr": 18, "hook": 18, "horizont": [7, 9, 18], "hous": 6, "how": [2, 11, 12, 14, 16], "howev": 16, "hsv": 9, "html": [1, 2, 3, 7, 18], "http": [1, 3, 6, 7, 8, 14, 18], "hub": 8, "hue": 9, "huggingfac": 8, "hw": 6, "i": [1, 2, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17], "i7": 18, "ic03": [4, 6, 16], "ic13": [4, 6, 16], "icdar": [4, 6], "icdar2019": 6, "id": 18, "ident": 1, "identifi": 4, "iiit": [4, 6], "iiit5k": [6, 16], "iiithw": [4, 6, 16], "imag": [4, 6, 7, 8, 9, 10, 14, 15, 16, 18], "imagenet": 8, "imageri": 1, "images_90k_norm": 6, "img": [6, 9, 16, 17], "img_cont": 7, "img_fold": [6, 16], "img_path": 7, "img_transform": 6, "imgur5k": [4, 6, 16], "imgur5k_annot": 6, "imlist": 6, "impact": 1, "implement": [6, 7, 8, 9, 10, 18], "import": [6, 7, 8, 9, 10, 12, 14, 15, 16, 17, 18], "improv": 8, "inappropri": 1, "incid": 1, "includ": [1, 6, 16, 17], "inclus": 1, "increas": 9, "independ": 9, "index": [2, 7], "indic": 10, "individu": 1, "infer": [4, 8, 9, 15, 18], "inform": [1, 2, 4, 6, 16], "input": [2, 7, 8, 9, 17, 18], "input_crop": 8, "input_pag": [8, 10, 18], "input_shap": 17, "input_tensor": 8, "inspir": [1, 9], "instal": [14, 15, 17], "instanc": [1, 18], "instanti": [8, 18], "instead": [6, 7, 8], "insult": 1, "int": [6, 7, 9], "int64": 10, "integ": 10, "integr": [4, 14, 16], "intel": 18, "interact": [1, 7, 10], "interfac": [14, 17], "interoper": 17, "interpol": 9, "interpret": [6, 7], "intersect": 10, "invert": 9, "investig": 1, "invis": 1, "involv": [1, 18], "io": [12, 14, 15, 17], "iou": 10, "iou_thresh": 10, "iou_threshold": 15, "irregular": [4, 8, 16], "isn": 6, "issu": [1, 2, 14], "italian": 6, "iter": [6, 9, 16, 18], "its": [7, 8, 9, 10, 16, 18], "itself": [8, 14], "j": 10, "job": 2, "join": 2, "jpeg": 9, "jpegqual": 9, "jpg": [6, 7, 14, 17], "json": [6, 16, 18], "json_output": 18, "jump": 2, "just": 1, "kei": [4, 6], "kera": [8, 17], "kernel": [4, 8, 9], "kernel_shap": 9, "keywoard": 8, "keyword": [6, 7, 8, 10], "kie": [8, 12], "kie_predictor": [8, 12], "kiepredictor": 8, "kind": 1, "know": [2, 17], "kwarg": [6, 7, 8, 10], "l": 10, "l_j": 10, "label": [6, 10, 15, 16], "label_fil": [6, 16], "label_fold": 6, "label_path": [6, 16], "labels_path": [6, 16], "ladder": 1, "lambda": 9, "lambdatransform": 9, "lang": 18, "languag": [1, 4, 6, 7, 8, 14, 18], "larg": [8, 14], "largest": 10, "last": [3, 6], "latenc": 8, "later": 2, "latest": 18, "latin": 6, "layer": 17, "layout": 18, "lead": 1, "leader": 1, "learn": [1, 4, 8, 17, 18], "least": 3, "left": [10, 18], "legacy_french": 6, "length": [6, 18], "less": [17, 18], "level": [1, 6, 10, 18], "leverag": 11, "lf": 14, "librari": [2, 3, 11, 12], "light": 4, "lightweight": 17, "like": 1, "limits_": 10, "line": [4, 8, 10, 18], "line_1_1": 18, "link": 12, "linknet": [4, 8], "linknet_resnet18": [8, 12, 17, 18], "linknet_resnet34": [8, 17, 18], "linknet_resnet50": [8, 18], "list": [6, 7, 9, 10, 14], "ll": 10, "load": [4, 6, 8, 15, 17], "load_state_dict": 12, "load_weight": 12, "loc_pr": 18, "local": [2, 4, 6, 8, 10, 16, 18], "localis": 6, "localizationconfus": 10, "locat": [2, 7, 18], "login": 8, "login_to_hub": [8, 14], "logo": [7, 15, 16], "love": 14, "lower": [9, 10, 18], "m": [2, 10, 18], "m1": 3, "macbook": 3, "machin": 17, "made": 4, "magc_resnet31": 8, "mai": [1, 2], "mail": 1, "main": 11, "maintain": 4, "mainten": 2, "make": [1, 2, 10, 12, 13, 14, 17, 18], "mani": [16, 18], "manipul": 18, "map": [6, 8], "map_loc": 12, "master": [4, 8, 18], "match": [10, 18], "mathcal": 10, "matplotlib": [7, 10], "max": [6, 9, 10], "max_angl": 9, "max_area": 9, "max_char": [6, 16], "max_delta": 9, "max_gain": 9, "max_gamma": 9, "max_qual": 9, "max_ratio": 9, "maximum": [6, 9], "maxval": [8, 9], "mbox": 10, "mean": [9, 10, 12], "meaniou": 10, "meant": [7, 17], "measur": 18, "media": 1, "median": 8, "meet": 12, "member": 1, "memori": [13, 17], "mention": 18, "merg": 6, "messag": 2, "meta": 18, "metadata": 17, "metal": 3, "method": [7, 9, 18], "metric": [10, 18], "middl": 18, "might": [17, 18], "min": 9, "min_area": 9, "min_char": [6, 16], "min_gain": 9, "min_gamma": 9, "min_qual": 9, "min_ratio": 9, "min_val": 9, "minde": [1, 3, 4, 8], "minim": [2, 4], "minimalist": [4, 8], "minimum": [3, 6, 9, 10, 18], "minval": 9, "miss": 3, "mistak": 1, "mixed_float16": 17, "mixed_precis": 17, "mjsynth": [4, 6, 16], "mnt": 6, "mobilenet": [8, 14], "mobilenet_v3_larg": 8, "mobilenet_v3_large_r": 8, "mobilenet_v3_smal": [8, 12], "mobilenet_v3_small_crop_orient": [8, 12], "mobilenet_v3_small_page_orient": [8, 12], "mobilenet_v3_small_r": 8, "mobilenetv3": 8, "modal": [4, 6], "mode": 3, "model": [6, 10, 13, 15, 16], "model_nam": [8, 14, 17], "model_path": [15, 17], "moder": 1, "modif": 2, "modifi": [8, 13, 18], "modul": [3, 7, 8, 9, 10, 18], "more": [2, 16, 18], "most": 18, "mozilla": 1, "multi": [4, 8], "multilingu": [6, 14], "multipl": [6, 7, 9, 18], "multipli": 9, "multiprocess": 13, "my": 8, "my_awesome_model": 14, "my_hook": 18, "n": [6, 10], "name": [6, 8, 17, 18], "nation": 1, "natur": [1, 4, 6], "ndarrai": [6, 7, 9, 10], "necessari": [3, 12, 13], "need": [2, 3, 6, 10, 12, 13, 14, 15, 18], "neg": 9, "nest": 18, "network": [4, 6, 8, 17], "neural": [4, 6, 8, 17], "new": [2, 10], "next": [6, 16], "nois": 9, "noisi": [4, 6], "non": [4, 6, 7, 8, 9, 10], "none": [6, 7, 8, 9, 10, 18], "normal": [8, 9], "norwegian": 6, "note": [0, 2, 6, 8, 12, 14, 15, 17], "now": 2, "np": [8, 9, 10, 18], "num_output_channel": 9, "num_sampl": [6, 16], "number": [6, 9, 10, 18], "numpi": [7, 8, 10, 18], "o": 3, "obb": 15, "obj_detect": 14, "object": [6, 7, 10, 15, 18], "objectness_scor": [7, 18], "oblig": 1, "obtain": 18, "occupi": 17, "ocr": [4, 6, 8, 10, 14], "ocr_carea": 18, "ocr_db_crnn": 10, "ocr_lin": 18, "ocr_pag": 18, "ocr_par": 18, "ocr_predictor": [8, 12, 14, 17, 18], "ocrdataset": [6, 16], "ocrmetr": 10, "ocrpredictor": [8, 12], "ocrx_word": 18, "offens": 1, "offici": [1, 8], "offlin": 1, "offset": 9, "onc": 18, "one": [2, 6, 8, 9, 12, 14, 18], "oneof": 9, "ones": [6, 10], "onli": [2, 8, 9, 10, 12, 14, 16, 17, 18], "onlin": 1, "onnx": 15, "onnxruntim": [15, 17], "onnxtr": 17, "opac": 9, "opacity_rang": 9, "open": [1, 2, 14, 17], "opinion": 1, "optic": [4, 18], "optim": [4, 18], "option": [6, 8, 12], "order": [2, 6, 7, 9], "org": [1, 6, 8, 18], "organ": 7, "orient": [1, 7, 8, 11, 15, 18], "orientationpredictor": 8, "other": [1, 2], "otherwis": [1, 7, 10], "our": [2, 8, 18], "out": [2, 8, 9, 10, 18], "outpout": 18, "output": [7, 9, 17], "output_s": [7, 9], "outsid": 13, "over": [6, 10, 18], "overal": [1, 8], "overlai": 7, "overview": 15, "overwrit": 12, "overwritten": 14, "own": 4, "p": [9, 18], "packag": [2, 4, 10, 13, 15, 16, 17], "pad": [6, 8, 9, 18], "page": [3, 6, 8, 10, 12, 18], "page1": 7, "page2": 7, "page_1": 18, "page_idx": [7, 18], "page_orientation_predictor": [8, 12], "page_param": 12, "pair": 10, "paper": 8, "par_1_1": 18, "paragraph": 18, "paragraph_break": 18, "param": [9, 18], "paramet": [4, 7, 8, 17], "pars": [4, 6], "parseq": [4, 8, 14, 17, 18], "part": [6, 9, 18], "parti": 3, "partial": 18, "particip": 1, "pass": [6, 7, 8, 12, 18], "password": 7, "patch": [8, 10], "path": [6, 7, 15, 16, 17], "path_to_checkpoint": 12, "path_to_custom_model": 17, "path_to_pt": 12, "pattern": 1, "pdf": [7, 8, 11], "pdfpage": 7, "peopl": 1, "per": [9, 18], "perform": [4, 7, 8, 9, 10, 13, 17, 18], "period": 1, "permiss": 1, "permut": [4, 8], "persian_lett": 6, "person": [1, 16], "phase": 18, "photo": 16, "physic": [1, 7], "pick": 9, "pictur": 7, "pip": [2, 3, 15, 17], "pipelin": 18, "pixel": [7, 9, 18], "pleas": 2, "plot": 10, "plt": 10, "plug": 14, "plugin": 3, "png": 7, "point": 17, "polici": 13, "polish": 6, "polit": 1, "polygon": [6, 10, 18], "pool": 8, "portugues": 6, "posit": [1, 10], "possibl": [2, 10, 14, 18], "post": [1, 18], "postprocessor": 18, "potenti": 8, "power": 4, "ppageno": 18, "pre": [2, 8, 17], "precis": [10, 18], "pred": 10, "pred_box": 10, "pred_label": 10, "predefin": 16, "predict": [7, 8, 10, 18], "predictor": [4, 7, 8, 11, 12, 14, 17], "prefer": 16, "preinstal": 3, "preprocessor": [12, 18], "prerequisit": 14, "present": 11, "preserv": [8, 9, 18], "preserve_aspect_ratio": [7, 8, 9, 12, 18], "pretrain": [4, 8, 10, 12, 17, 18], "pretrained_backbon": [8, 12], "print": 18, "prior": 6, "privaci": 1, "privat": 1, "probabl": 9, "problem": 2, "procedur": 9, "process": [2, 4, 7, 12, 18], "processor": 18, "produc": [11, 18], "product": 17, "profession": 1, "project": [2, 16], "promptli": 1, "proper": 2, "properli": 6, "provid": [1, 2, 4, 14, 15, 16, 18], "public": [1, 4], "publicli": 18, "publish": 1, "pull": 14, "punctuat": 6, "pure": 6, "purpos": 2, "push_to_hf_hub": [8, 14], "py": 14, "pypdfium2": [3, 7], "pyplot": [7, 10], "python": [2, 15], "python3": 14, "pytorch": [3, 4, 8, 9, 12, 14, 17, 18], "q": 2, "qr": [7, 15], "qr_code": 16, "qualiti": 9, "question": 1, "quickli": 4, "quicktour": 11, "r": 18, "race": 1, "ramdisk": 6, "rand": [8, 9, 10, 17, 18], "random": [8, 9, 10, 18], "randomappli": 9, "randombright": 9, "randomcontrast": 9, "randomcrop": 9, "randomgamma": 9, "randomhorizontalflip": 9, "randomhu": 9, "randomjpegqu": 9, "randomli": 9, "randomres": 9, "randomrot": 9, "randomsatur": 9, "randomshadow": 9, "rang": 9, "rassi": 14, "ratio": [8, 9, 18], "raw": [7, 10], "re": 17, "read": [4, 6, 8], "read_html": 7, "read_img_as_numpi": 7, "read_img_as_tensor": 7, "read_pdf": 7, "readi": 17, "real": [4, 8, 9], "reason": [1, 4, 6], "rebuild": 2, "rebuilt": 2, "recal": [10, 18], "receipt": [4, 6, 18], "reco_arch": [8, 12, 14, 17], "reco_b": 18, "reco_model": [12, 14, 17], "reco_param": 12, "reco_predictor": 12, "recogn": 18, "recognit": [6, 10, 11, 12], "recognition_predictor": [8, 18], "recognition_task": [6, 16], "recognitiondataset": [6, 16], "recognitionpredictor": [8, 12], "rectangular": 8, "reduc": [3, 9], "refer": [2, 3, 12, 14, 15, 16, 18], "regardless": 1, "region": 18, "regroup": 10, "regular": 16, "reject": 1, "rel": [7, 9, 10, 18], "relat": 7, "releas": [0, 3], "relev": 15, "religion": 1, "remov": 1, "render": [7, 18], "repo": 8, "repo_id": [8, 14], "report": 1, "repositori": [6, 8, 14], "repres": [1, 17, 18], "represent": [4, 8], "request": [1, 14], "requir": [3, 9, 17], "research": 4, "residu": 8, "resiz": [9, 18], "resnet": 8, "resnet18": [8, 14], "resnet31": 8, "resnet34": 8, "resnet50": [8, 14], "resolv": 7, "resolve_block": 18, "resolve_lin": 18, "resourc": 16, "respect": 1, "rest": [2, 9, 10], "restrict": 13, "result": [2, 6, 7, 11, 14, 17, 18], "return": 18, "reusabl": 18, "review": 1, "rgb": [7, 9], "rgb_mode": 7, "rgb_output": 7, "right": [1, 8, 10], "robust": [4, 6], "root": 6, "rotat": [6, 7, 8, 9, 10, 11, 12, 16, 18], "run": [2, 3, 8], "same": [2, 7, 10, 16, 17, 18], "sampl": [6, 16, 18], "sample_transform": 6, "sar": [4, 8], "sar_resnet31": [8, 18], "satur": 9, "save": [8, 16], "scale": [7, 8, 9, 10], "scale_rang": 9, "scan": [4, 6], "scene": [4, 6, 8], "score": [7, 10], "script": [2, 16], "seamless": 4, "seamlessli": [4, 18], "search": 8, "searchabl": 11, "sec": 18, "second": 18, "section": [12, 14, 15, 17, 18], "secur": [1, 13], "see": [1, 2], "seen": 18, "segment": [4, 8, 18], "self": 18, "semant": [4, 8], "send": 18, "sens": 10, "sensit": 16, "separ": 18, "sequenc": [4, 6, 7, 8, 10, 18], "sequenti": [9, 18], "seri": 1, "seriou": 1, "set": [1, 3, 6, 8, 10, 13, 15, 18], "set_global_polici": 17, "sever": [7, 9, 18], "sex": 1, "sexual": 1, "shade": 9, "shape": [4, 7, 8, 9, 10, 18], "share": [13, 16], "shift": 9, "shm": 13, "should": [2, 6, 7, 9, 10], "show": [4, 7, 8, 10, 12, 14, 15], "showcas": [2, 11], "shuffl": [6, 9], "side": 10, "signatur": 7, "signific": 16, "simpl": [4, 8, 17], "simpler": 8, "sinc": [6, 16], "singl": [1, 2, 4, 6], "single_img_doc": 17, "size": [1, 6, 7, 9, 15, 18], "skew": 18, "slack": 2, "slightli": 8, "small": [2, 8, 18], "smallest": 7, "snapshot_download": 8, "snippet": 18, "so": [2, 3, 6, 8, 14, 16], "social": 1, "socio": 1, "some": [3, 11, 14, 16], "someth": 2, "somewher": 2, "sort": 1, "sourc": [6, 7, 8, 9, 10, 14], "space": [1, 18], "span": 18, "spanish": 6, "spatial": [4, 6, 7], "specif": [2, 3, 10, 12, 16, 18], "specifi": [1, 6, 7], "speed": [4, 8, 18], "sphinx": 2, "sroie": [4, 6, 16], "stabl": 3, "stackoverflow": 2, "stage": 4, "standalon": 11, "standard": 9, "start": 6, "state": [4, 10, 15], "static": 10, "statu": 1, "std": [9, 12], "step": 13, "still": 18, "str": [6, 7, 8, 9, 10], "straight": [6, 8, 16, 18], "straighten": 18, "straighten_pag": [8, 12, 18], "straigten_pag": 12, "stream": 7, "street": [4, 6], "strict": 3, "strictli": 10, "string": [6, 7, 10, 18], "strive": 3, "strong": [4, 8], "structur": [17, 18], "subset": [6, 18], "suggest": [2, 14], "sum": 10, "summari": 10, "support": [3, 12, 15, 17, 18], "sustain": 1, "svhn": [4, 6, 16], "svt": [6, 16], "swedish": 6, "symmetr": [8, 9, 18], "symmetric_pad": [8, 9, 18], "synthet": 4, "synthtext": [4, 6, 16], "system": 18, "t": [2, 6, 12, 17, 18], "tabl": [14, 15, 16], "take": [1, 6, 18], "target": [6, 7, 9, 10, 16], "target_s": 6, "task": [4, 6, 8, 14, 16, 18], "task2": 6, "team": 3, "techminde": 3, "templat": [2, 4], "tensor": [6, 7, 9, 18], "tensorflow": [3, 4, 7, 8, 9, 12, 14, 17, 18], "tensorspec": 17, "term": 1, "test": [6, 16], "test_set": 6, "text": [6, 7, 8, 10, 16], "text_output": 18, "textmatch": 10, "textnet": 8, "textnet_bas": 8, "textnet_smal": 8, "textnet_tini": 8, "textract": [4, 18], "textstylebrush": [4, 6], "textual": [4, 6, 7, 8, 18], "tf": [3, 7, 8, 9, 14, 17], "than": [2, 10, 14], "thank": 2, "thei": [1, 10], "them": [6, 18], "thi": [1, 2, 3, 5, 6, 9, 10, 12, 13, 14, 16, 17, 18], "thing": [17, 18], "third": 3, "those": [1, 7, 18], "threaten": 1, "threshold": 18, "through": [1, 9, 15, 16], "tilman": 14, "time": [1, 4, 8, 10, 16], "tini": 8, "titl": [7, 18], "tm": 18, "tmp": 13, "togeth": [2, 7], "tograi": 9, "tool": 16, "top": [10, 17, 18], "topic": 2, "torch": [3, 9, 12, 14, 17], "torchvis": 9, "total": 12, "toward": [1, 3], "train": [2, 6, 8, 9, 14, 15, 16, 17, 18], "train_it": [6, 16], "train_load": [6, 16], "train_pytorch": 14, "train_set": [6, 16], "train_tensorflow": 14, "trainabl": [4, 8], "tranform": 9, "transcrib": 18, "transfer": [4, 6], "transfo": 9, "transform": [4, 6, 8], "translat": 1, "troll": 1, "true": [6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18], "truth": 10, "tune": 17, "tupl": [6, 7, 9, 10], "two": [7, 13], "txt": 6, "type": [7, 10, 14, 17, 18], "typic": 18, "u": [1, 2], "ucsd": 6, "udac": 2, "uint8": [7, 8, 10, 18], "ukrainian": 6, "unaccept": 1, "underli": [16, 18], "underneath": 7, "understand": [4, 6, 18], "uniform": [8, 9], "uniformli": 9, "uninterrupt": [7, 18], "union": 10, "unittest": 2, "unlock": 7, "unoffici": 8, "unprofession": 1, "unsolicit": 1, "unsupervis": 4, "unwelcom": 1, "up": [8, 18], "updat": 10, "upgrad": 2, "upper": [6, 9], "uppercas": 16, "url": 7, "us": [1, 2, 3, 6, 8, 10, 11, 12, 13, 14, 15, 18], "usabl": 18, "usag": [13, 17], "use_polygon": [6, 10, 16], "useabl": 18, "user": [4, 7, 11], "utf": 18, "util": 17, "v1": 14, "v3": [8, 14, 18], "valid": 16, "valu": [2, 7, 9, 18], "valuabl": 4, "variabl": 13, "varieti": 6, "veri": 8, "version": [1, 2, 3, 17, 18], "vgg": 8, "vgg16": 14, "vgg16_bn_r": 8, "via": 1, "vietnames": 6, "view": [4, 6], "viewpoint": 1, "violat": 1, "visibl": 1, "vision": [4, 6, 8], "visiondataset": 6, "visiontransform": 8, "visual": [3, 4, 15], "visualize_pag": 10, "vit_": 8, "vit_b": 8, "vitstr": [4, 8, 17], "vitstr_bas": [8, 18], "vitstr_smal": [8, 12, 17, 18], "viz": 3, "vocab": [12, 14, 16, 17, 18], "vocabulari": [6, 12, 14], "w": [7, 8, 9, 10], "w3": 18, "wa": 1, "wai": [1, 4, 16], "want": [2, 17, 18], "warmup": 18, "wasn": 2, "we": [1, 2, 3, 4, 7, 9, 12, 14, 16, 17, 18], "weasyprint": 7, "web": [2, 7], "websit": 6, "welcom": 1, "well": [1, 17], "were": [1, 7, 18], "what": 1, "when": [1, 2, 8], "whenev": 2, "where": [2, 7, 9, 10], "whether": [2, 6, 7, 9, 10, 16, 18], "which": [1, 8, 13, 15, 16, 18], "whichev": 3, "while": [9, 18], "why": 1, "width": [7, 9], "wiki": 1, "wildreceipt": [4, 6, 16], "window": [8, 10], "wish": 2, "within": 1, "without": [1, 6, 8], "wonder": 2, "word": [4, 6, 8, 10, 18], "word_1_1": 18, "word_1_2": 18, "word_1_3": 18, "wordgener": [6, 16], "words_onli": 10, "work": [12, 13, 18], "workflow": 2, "worklow": 2, "world": [10, 18], "worth": 8, "wrap": 18, "wrapper": [6, 9], "write": 13, "written": [1, 7], "www": [1, 7, 18], "x": [7, 9, 10], "x_ascend": 18, "x_descend": 18, "x_i": 10, "x_size": 18, "x_wconf": 18, "xhtml": 18, "xmax": 7, "xmin": 7, "xml": 18, "xml_bytes_str": 18, "xml_element": 18, "xml_output": 18, "xmln": 18, "y": 10, "y_i": 10, "y_j": 10, "yet": 15, "ymax": 7, "ymin": 7, "yolov8": 15, "you": [2, 3, 6, 7, 8, 12, 13, 14, 15, 16, 17, 18], "your": [2, 4, 7, 10, 18], "yoursit": 7, "zero": [9, 10], "zoo": 12, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 6, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 6, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 6, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 6, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 6, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 6, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 6, "\u00e4\u00f6\u00e4\u00f6": 6, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 6, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 6, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 6, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 6, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 6, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 6, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0905": 6, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 6, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 6, "\u0950": 6, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 6, "\u09bd": 6, "\u09ce": 6, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 6}, "titles": ["Changelog", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 2, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 1], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 1], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 1], "31": 0, "4": [0, 1], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 18, "approach": 18, "architectur": 18, "arg": [6, 7, 8, 9, 10], "artefact": 7, "artefactdetect": 15, "attribut": 1, "avail": [15, 16, 18], "aw": 13, "ban": 1, "block": 7, "bug": 2, "changelog": 0, "choos": [16, 18], "classif": [8, 12, 14], "code": [1, 2], "codebas": 2, "commit": 2, "commun": 14, "compos": 9, "conda": 3, "conduct": 1, "connect": 2, "continu": 2, "contrib": 5, "contribut": [2, 5, 15], "contributor": 1, "convent": 14, "correct": 1, "coven": 1, "custom": [6, 12], "data": 16, "dataload": 6, "dataset": [4, 6, 16], "detect": [4, 8, 14, 16, 18], "develop": 2, "do": 18, "doctr": [2, 4, 5, 6, 7, 8, 9, 10, 11], "document": [2, 4, 7], "end": 18, "enforc": 1, "evalu": 10, "export": 17, "factori": 8, "featur": [2, 4], "feedback": 2, "file": 7, "from": 14, "gener": [6, 16], "git": 3, "guidelin": 1, "half": 17, "hub": 14, "huggingfac": 14, "i": 18, "infer": 17, "instal": [2, 3], "integr": [2, 15], "io": 7, "lambda": 13, "let": 2, "line": 7, "linux": 3, "load": [12, 14, 16], "loader": 6, "main": 4, "mode": 2, "model": [4, 8, 12, 14, 17, 18], "modifi": 2, "modul": [5, 15], "name": 14, "notebook": 11, "object": 16, "ocr": [16, 18], "onli": 3, "onnx": 17, "optim": 17, "option": 18, "orient": 12, "our": 1, "output": 18, "own": [12, 16], "packag": 3, "page": 7, "perman": 1, "pipelin": 15, "pledg": 1, "precis": 17, "predictor": 18, "prepar": 17, "prerequisit": 3, "pretrain": 14, "push": 14, "python": 3, "qualiti": 2, "question": 2, "read": 7, "readi": 16, "recognit": [4, 8, 14, 16, 18], "report": 2, "request": 2, "respons": 1, "return": [6, 7, 8, 10], "right": 18, "scope": 1, "share": 14, "should": 18, "stage": 18, "standard": 1, "structur": [2, 7], "style": 2, "support": [4, 5, 6, 9], "synthet": [6, 16], "task": 10, "temporari": 1, "test": 2, "text": [4, 18], "train": 12, "transform": 9, "two": 18, "unit": 2, "us": [16, 17], "util": 10, "v0": 0, "verif": 2, "via": 3, "visual": 10, "vocab": 6, "warn": 1, "what": 18, "word": 7, "your": [12, 14, 15, 16, 17], "zoo": [4, 8]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[1, "correction"]], "2. Warning": [[1, "warning"]], "3. Temporary Ban": [[1, "temporary-ban"]], "4. Permanent Ban": [[1, "permanent-ban"]], "AWS Lambda": [[13, null]], "Advanced options": [[18, "advanced-options"]], "Args:": [[6, "args"], [6, "id4"], [6, "id7"], [6, "id10"], [6, "id13"], [6, "id16"], [6, "id19"], [6, "id22"], [6, "id25"], [6, "id29"], [6, "id32"], [6, "id37"], [6, "id40"], [6, "id46"], [6, "id49"], [6, "id50"], [6, "id51"], [6, "id54"], [6, "id57"], [6, "id60"], [6, "id61"], [7, "args"], [7, "id2"], [7, "id3"], [7, "id4"], [7, "id5"], [7, "id6"], [7, "id7"], [7, "id10"], [7, "id12"], [7, "id14"], [7, "id16"], [7, "id20"], [7, "id24"], [7, "id28"], [8, "args"], [8, "id3"], [8, "id8"], [8, "id13"], [8, "id17"], [8, "id21"], [8, "id26"], [8, "id31"], [8, "id36"], [8, "id41"], [8, "id46"], [8, "id50"], [8, "id54"], [8, "id59"], [8, "id63"], [8, "id68"], [8, "id73"], [8, "id77"], [8, "id81"], [8, "id85"], [8, "id90"], [8, "id95"], [8, "id99"], [8, "id104"], [8, "id109"], [8, "id114"], [8, "id119"], [8, "id123"], [8, "id127"], [8, "id132"], [8, "id137"], [8, "id142"], [8, "id146"], [8, "id150"], [8, "id155"], [8, "id159"], [8, "id163"], [8, "id167"], [8, "id169"], [8, "id171"], [8, "id173"], [9, "args"], [9, "id1"], [9, "id2"], [9, "id3"], [9, "id4"], [9, "id5"], [9, "id6"], [9, "id7"], [9, "id8"], [9, "id9"], [9, "id10"], [9, "id11"], [9, "id12"], [9, "id13"], [9, "id14"], [9, "id15"], [9, "id16"], [9, "id17"], [9, "id18"], [9, "id19"], [10, "args"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"]], "Artefact": [[7, "artefact"]], "ArtefactDetection": [[15, "artefactdetection"]], "Attribution": [[1, "attribution"]], "Available Datasets": [[16, "available-datasets"]], "Available architectures": [[18, "available-architectures"], [18, "id1"], [18, "id2"]], "Available contribution modules": [[15, "available-contribution-modules"]], "Block": [[7, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[16, null]], "Choosing the right model": [[18, null]], "Classification": [[14, "classification"]], "Code quality": [[2, "code-quality"]], "Code style verification": [[2, "code-style-verification"]], "Codebase structure": [[2, "codebase-structure"]], "Commits": [[2, "commits"]], "Composing transformations": [[9, "composing-transformations"]], "Continuous Integration": [[2, "continuous-integration"]], "Contributing to docTR": [[2, null]], "Contributor Covenant Code of Conduct": [[1, null]], "Custom dataset loader": [[6, "custom-dataset-loader"]], "Custom orientation classification models": [[12, "custom-orientation-classification-models"]], "Data Loading": [[16, "data-loading"]], "Dataloader": [[6, "dataloader"]], "Detection": [[14, "detection"], [16, "detection"]], "Detection predictors": [[18, "detection-predictors"]], "Developer mode installation": [[2, "developer-mode-installation"]], "Developing docTR": [[2, "developing-doctr"]], "Document": [[7, "document"]], "Document structure": [[7, "document-structure"]], "End-to-End OCR": [[18, "end-to-end-ocr"]], "Enforcement": [[1, "enforcement"]], "Enforcement Guidelines": [[1, "enforcement-guidelines"]], "Enforcement Responsibilities": [[1, "enforcement-responsibilities"]], "Export to ONNX": [[17, "export-to-onnx"]], "Feature requests & bug report": [[2, "feature-requests-bug-report"]], "Feedback": [[2, "feedback"]], "File reading": [[7, "file-reading"]], "Half-precision": [[17, "half-precision"]], "Installation": [[3, null]], "Integrate contributions into your pipeline": [[15, null]], "Let\u2019s connect": [[2, "let-s-connect"]], "Line": [[7, "line"]], "Loading from Huggingface Hub": [[14, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[12, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[12, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[4, "main-features"]], "Model optimization": [[17, "model-optimization"]], "Model zoo": [[4, "model-zoo"]], "Modifying the documentation": [[2, "modifying-the-documentation"]], "Naming conventions": [[14, "naming-conventions"]], "OCR": [[16, "ocr"]], "Object Detection": [[16, "object-detection"]], "Our Pledge": [[1, "our-pledge"]], "Our Standards": [[1, "our-standards"]], "Page": [[7, "page"]], "Preparing your model for inference": [[17, null]], "Prerequisites": [[3, "prerequisites"]], "Pretrained community models": [[14, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[14, "pushing-to-the-huggingface-hub"]], "Questions": [[2, "questions"]], "Recognition": [[14, "recognition"], [16, "recognition"]], "Recognition predictors": [[18, "recognition-predictors"]], "Returns:": [[6, "returns"], [7, "returns"], [7, "id11"], [7, "id13"], [7, "id15"], [7, "id19"], [7, "id23"], [7, "id27"], [7, "id31"], [8, "returns"], [8, "id6"], [8, "id11"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id29"], [8, "id34"], [8, "id39"], [8, "id44"], [8, "id49"], [8, "id53"], [8, "id57"], [8, "id62"], [8, "id66"], [8, "id71"], [8, "id76"], [8, "id80"], [8, "id84"], [8, "id88"], [8, "id93"], [8, "id98"], [8, "id102"], [8, "id107"], [8, "id112"], [8, "id117"], [8, "id122"], [8, "id126"], [8, "id130"], [8, "id135"], [8, "id140"], [8, "id145"], [8, "id149"], [8, "id153"], [8, "id158"], [8, "id162"], [8, "id166"], [8, "id168"], [8, "id170"], [8, "id172"], [10, "returns"]], "Scope": [[1, "scope"]], "Share your model with the community": [[14, null]], "Supported Vocabs": [[6, "supported-vocabs"]], "Supported contribution modules": [[5, "supported-contribution-modules"]], "Supported datasets": [[4, "supported-datasets"]], "Supported transformations": [[9, "supported-transformations"]], "Synthetic dataset generator": [[6, "synthetic-dataset-generator"], [16, "synthetic-dataset-generator"]], "Task evaluation": [[10, "task-evaluation"]], "Text Detection": [[18, "text-detection"]], "Text Recognition": [[18, "text-recognition"]], "Text detection models": [[4, "text-detection-models"]], "Text recognition models": [[4, "text-recognition-models"]], "Train your own model": [[12, null]], "Two-stage approaches": [[18, "two-stage-approaches"]], "Unit tests": [[2, "unit-tests"]], "Use your own datasets": [[16, "use-your-own-datasets"]], "Using your ONNX exported model": [[17, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[3, "via-conda-only-for-linux"]], "Via Git": [[3, "via-git"]], "Via Python Package": [[3, "via-python-package"]], "Visualization": [[10, "visualization"]], "What should I do with the output?": [[18, "what-should-i-do-with-the-output"]], "Word": [[7, "word"]], "docTR Notebooks": [[11, null]], "docTR Vocabs": [[6, "id62"]], "docTR: Document Text Recognition": [[4, null]], "doctr.contrib": [[5, null]], "doctr.datasets": [[6, null], [6, "datasets"]], "doctr.io": [[7, null]], "doctr.models": [[8, null]], "doctr.models.classification": [[8, "doctr-models-classification"]], "doctr.models.detection": [[8, "doctr-models-detection"]], "doctr.models.factory": [[8, "doctr-models-factory"]], "doctr.models.recognition": [[8, "doctr-models-recognition"]], "doctr.models.zoo": [[8, "doctr-models-zoo"]], "doctr.transforms": [[9, null]], "doctr.utils": [[10, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[7, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[7, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[9, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[6, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[9, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[9, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[6, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[6, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[8, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[6, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[6, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[7, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[7, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[6, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[6, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[9, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[9, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[6, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[6, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[6, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[6, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[6, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[8, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[9, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[7, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[6, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[9, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[8, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[6, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[9, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[7, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[9, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[9, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[9, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[9, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[9, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[9, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[9, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[9, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[9, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[9, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[9, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[9, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[7, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[7, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[7, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[6, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[9, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[7, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[7, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[6, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[6, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[6, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[6, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[9, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[10, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[6, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[7, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[6, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[6, 0, 1, "", "CORD"], [6, 0, 1, "", "CharacterGenerator"], [6, 0, 1, "", "DetectionDataset"], [6, 0, 1, "", "DocArtefacts"], [6, 0, 1, "", "FUNSD"], [6, 0, 1, "", "IC03"], [6, 0, 1, "", "IC13"], [6, 0, 1, "", "IIIT5K"], [6, 0, 1, "", "IIITHWS"], [6, 0, 1, "", "IMGUR5K"], [6, 0, 1, "", "MJSynth"], [6, 0, 1, "", "OCRDataset"], [6, 0, 1, "", "RecognitionDataset"], [6, 0, 1, "", "SROIE"], [6, 0, 1, "", "SVHN"], [6, 0, 1, "", "SVT"], [6, 0, 1, "", "SynthText"], [6, 0, 1, "", "WILDRECEIPT"], [6, 0, 1, "", "WordGenerator"], [6, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[6, 0, 1, "", "DataLoader"]], "doctr.io": [[7, 0, 1, "", "Artefact"], [7, 0, 1, "", "Block"], [7, 0, 1, "", "Document"], [7, 0, 1, "", "DocumentFile"], [7, 0, 1, "", "Line"], [7, 0, 1, "", "Page"], [7, 0, 1, "", "Word"], [7, 1, 1, "", "decode_img_as_tensor"], [7, 1, 1, "", "read_html"], [7, 1, 1, "", "read_img_as_numpy"], [7, 1, 1, "", "read_img_as_tensor"], [7, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[7, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[7, 2, 1, "", "from_images"], [7, 2, 1, "", "from_pdf"], [7, 2, 1, "", "from_url"]], "doctr.io.Page": [[7, 2, 1, "", "show"]], "doctr.models": [[8, 1, 1, "", "kie_predictor"], [8, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[8, 1, 1, "", "crop_orientation_predictor"], [8, 1, 1, "", "magc_resnet31"], [8, 1, 1, "", "mobilenet_v3_large"], [8, 1, 1, "", "mobilenet_v3_large_r"], [8, 1, 1, "", "mobilenet_v3_small"], [8, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [8, 1, 1, "", "mobilenet_v3_small_page_orientation"], [8, 1, 1, "", "mobilenet_v3_small_r"], [8, 1, 1, "", "page_orientation_predictor"], [8, 1, 1, "", "resnet18"], [8, 1, 1, "", "resnet31"], [8, 1, 1, "", "resnet34"], [8, 1, 1, "", "resnet50"], [8, 1, 1, "", "textnet_base"], [8, 1, 1, "", "textnet_small"], [8, 1, 1, "", "textnet_tiny"], [8, 1, 1, "", "vgg16_bn_r"], [8, 1, 1, "", "vit_b"], [8, 1, 1, "", "vit_s"]], "doctr.models.detection": [[8, 1, 1, "", "db_mobilenet_v3_large"], [8, 1, 1, "", "db_resnet50"], [8, 1, 1, "", "detection_predictor"], [8, 1, 1, "", "fast_base"], [8, 1, 1, "", "fast_small"], [8, 1, 1, "", "fast_tiny"], [8, 1, 1, "", "linknet_resnet18"], [8, 1, 1, "", "linknet_resnet34"], [8, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[8, 1, 1, "", "from_hub"], [8, 1, 1, "", "login_to_hub"], [8, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[8, 1, 1, "", "crnn_mobilenet_v3_large"], [8, 1, 1, "", "crnn_mobilenet_v3_small"], [8, 1, 1, "", "crnn_vgg16_bn"], [8, 1, 1, "", "master"], [8, 1, 1, "", "parseq"], [8, 1, 1, "", "recognition_predictor"], [8, 1, 1, "", "sar_resnet31"], [8, 1, 1, "", "vitstr_base"], [8, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[9, 0, 1, "", "ChannelShuffle"], [9, 0, 1, "", "ColorInversion"], [9, 0, 1, "", "Compose"], [9, 0, 1, "", "GaussianBlur"], [9, 0, 1, "", "GaussianNoise"], [9, 0, 1, "", "LambdaTransformation"], [9, 0, 1, "", "Normalize"], [9, 0, 1, "", "OneOf"], [9, 0, 1, "", "RandomApply"], [9, 0, 1, "", "RandomBrightness"], [9, 0, 1, "", "RandomContrast"], [9, 0, 1, "", "RandomCrop"], [9, 0, 1, "", "RandomGamma"], [9, 0, 1, "", "RandomHorizontalFlip"], [9, 0, 1, "", "RandomHue"], [9, 0, 1, "", "RandomJpegQuality"], [9, 0, 1, "", "RandomResize"], [9, 0, 1, "", "RandomRotate"], [9, 0, 1, "", "RandomSaturation"], [9, 0, 1, "", "RandomShadow"], [9, 0, 1, "", "Resize"], [9, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[10, 0, 1, "", "DetectionMetric"], [10, 0, 1, "", "LocalizationConfusion"], [10, 0, 1, "", "OCRMetric"], [10, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.visualization": [[10, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [1, 7, 8, 10, 14, 17], "0": [1, 3, 6, 9, 10, 12, 15, 16, 18], "00": 18, "01": 18, "0123456789": 6, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "02562": 8, "03": 18, "035": 18, "0361328125": 18, "04": 18, "05": 18, "06": 18, "06640625": 18, "07": 18, "08": [9, 18], "09": 18, "0966796875": 18, "1": [6, 7, 8, 9, 10, 12, 16, 18], "10": [6, 10, 18], "100": [6, 9, 10, 16, 18], "1000": 18, "101": 6, "1024": [8, 12, 18], "104": 6, "106": 6, "108": 6, "1095": 16, "11": 18, "110": 10, "1107": 16, "114": 6, "115": 6, "1156": 16, "116": 6, "118": 6, "11800h": 18, "11th": 18, "12": 18, "120": 6, "123": 6, "126": 6, "1268": 16, "128": [8, 12, 17, 18], "13": 18, "130": 6, "13068": 16, "131": 6, "1337891": 16, "1357421875": 18, "1396484375": 18, "14": 18, "1420": 18, "14470v1": 6, "149": 16, "15": 18, "150": [10, 18], "1552": 18, "16": [8, 17, 18], "1630859375": 18, "1684": 18, "16x16": 8, "17": 18, "1778": 18, "1782": 18, "18": [8, 18], "185546875": 18, "1900": 18, "1910": 8, "19342": 16, "19370": 16, "195": 6, "19598": 16, "199": 18, "1999": 18, "2": [3, 4, 6, 7, 9, 15, 18], "20": 18, "200": 10, "2000": 16, "2003": [4, 6], "2012": 6, "2013": [4, 6], "2015": 6, "2019": 4, "207901": 16, "21": 18, "2103": 6, "2186": 16, "21888": 16, "22": 18, "224": [8, 9], "225": 9, "22672": 16, "229": [9, 16], "23": 18, "233": 16, "236": 6, "24": 18, "246": 16, "249": 16, "25": 18, "2504": 18, "255": [7, 8, 9, 10, 18], "256": 8, "257": 16, "26": 18, "26032": 16, "264": 12, "27": 18, "2700": 16, "2710": 18, "2749": 12, "28": 18, "287": 12, "29": 18, "296": 12, "299": 12, "2d": 18, "3": [3, 4, 7, 8, 9, 10, 17, 18], "30": 18, "300": 16, "3000": 16, "301": 12, "30595": 18, "30ghz": 18, "31": 8, "32": [6, 8, 9, 12, 16, 17, 18], "3232421875": 18, "33": [9, 18], "33402": 16, "33608": 16, "34": [8, 18], "340": 18, "3456": 18, "3515625": 18, "36": 18, "360": 16, "37": [6, 18], "38": 18, "39": 18, "4": [8, 9, 10, 18], "40": 18, "406": 9, "41": 18, "42": 18, "43": 18, "44": 18, "45": 18, "456": 9, "46": 18, "47": 18, "472": 16, "48": [6, 18], "485": 9, "49": 18, "49377": 16, "5": [6, 9, 10, 15, 18], "50": [8, 16, 18], "51": 18, "51171875": 18, "512": 8, "52": [6, 18], "529": 18, "53": 18, "54": 18, "540": 18, "5478515625": 18, "55": 18, "56": 18, "57": 18, "58": [6, 18], "580": 18, "5810546875": 18, "583": 18, "59": 18, "597": 18, "5k": [4, 6], "5m": 18, "6": [9, 18], "60": 9, "600": [8, 10, 18], "61": 18, "62": 18, "626": 16, "63": 18, "64": [8, 9, 18], "641": 18, "647": 16, "65": 18, "66": 18, "67": 18, "68": 18, "69": 18, "693": 12, "694": 12, "695": 12, "6m": 18, "7": 18, "70": [6, 10, 18], "707470": 16, "71": [6, 18], "7100000": 16, "7141797": 16, "7149": 16, "72": 18, "72dpi": 7, "73": 18, "73257": 16, "74": 18, "75": [9, 18], "7581382": 16, "76": 18, "77": 18, "772": 12, "772875": 16, "78": 18, "785": 12, "79": 18, "793533": 16, "796": 16, "798": 12, "7m": 18, "8": [8, 9, 18], "80": 18, "800": [8, 10, 16, 18], "81": 18, "82": 18, "83": 18, "84": 18, "849": 16, "85": 18, "8564453125": 18, "857": 18, "85875": 16, "86": 18, "8603515625": 18, "87": 18, "8707": 16, "88": 18, "89": 18, "9": [3, 9, 18], "90": 18, "90k": 6, "90kdict32px": 6, "91": 18, "914085328578949": 18, "92": 18, "93": 18, "94": [6, 18], "95": [10, 18], "9578408598899841": 18, "96": 18, "97": 18, "98": 18, "99": 18, "9949972033500671": 18, "A": [1, 2, 4, 6, 7, 8, 11, 17], "As": 2, "Be": 18, "Being": 1, "By": 13, "For": [1, 2, 3, 12, 18], "If": [2, 7, 8, 12, 18], "In": [2, 6, 16], "It": [9, 14, 15, 17], "Its": [4, 8], "No": [1, 18], "Of": 6, "Or": [15, 17], "The": [1, 2, 6, 7, 10, 13, 15, 16, 17, 18], "Then": 8, "To": [2, 3, 13, 14, 15, 17, 18], "_": [1, 6, 8], "__call__": 18, "_build": 2, "_i": 10, "ab": 6, "abc": 17, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "abdef": [6, 16], "abl": [16, 18], "about": [1, 16, 18], "abov": 18, "abstractdataset": 6, "abus": 1, "accept": 1, "access": [4, 7, 16, 18], "account": [1, 14], "accur": 18, "accuraci": 10, "achiev": 17, "act": 1, "action": 1, "activ": 4, "ad": [2, 8, 9], "adapt": 1, "add": [9, 10, 14, 18], "add_hook": 18, "add_label": 10, "addit": [2, 3, 7, 15, 18], "addition": [2, 18], "address": [1, 7], "adjust": 9, "advanc": 1, "advantag": 17, "advis": 2, "aesthet": [4, 6], "affect": 1, "after": [14, 18], "ag": 1, "again": 8, "aggreg": [10, 16], "aggress": 1, "align": [1, 7, 9], "all": [1, 2, 5, 6, 7, 9, 10, 15, 16, 18], "allow": [1, 17], "along": 18, "alreadi": [2, 17], "also": [1, 8, 14, 15, 16, 18], "alwai": 16, "an": [1, 2, 4, 6, 7, 8, 10, 15, 17, 18], "analysi": [7, 15], "ancient_greek": 6, "angl": [7, 9], "ani": [1, 6, 7, 8, 9, 10, 17, 18], "annot": 6, "anot": 16, "anoth": [8, 12, 16], "answer": 1, "anyascii": 10, "anyon": 4, "anyth": 15, "api": [2, 4], "apolog": 1, "apologi": 1, "app": 2, "appear": 1, "appli": [1, 6, 9], "applic": [4, 8], "appoint": 1, "appreci": 14, "appropri": [1, 2, 18], "ar": [1, 2, 3, 5, 6, 7, 9, 10, 11, 15, 16, 18], "arab": 6, "arabic_diacrit": 6, "arabic_lett": 6, "arabic_punctu": 6, "arbitrarili": [4, 8], "arch": [8, 14], "architectur": [4, 8, 14, 15], "area": 18, "argument": [6, 7, 8, 10, 12, 18], "around": 1, "arrai": [7, 9, 10], "art": [4, 15], "artefact": [10, 15, 18], "artefact_typ": 7, "artifici": [4, 6], "arxiv": [6, 8], "asarrai": 10, "ascii_lett": 6, "aspect": [4, 8, 9, 18], "assess": 10, "assign": 10, "associ": 7, "assum": 8, "assume_straight_pag": [8, 12, 18], "astyp": [8, 10, 18], "attack": 1, "attend": [4, 8], "attent": [1, 8], "autom": 4, "automat": 18, "autoregress": [4, 8], "avail": [1, 4, 5, 9], "averag": [9, 18], "avoid": [1, 3], "aw": [4, 18], "awar": 18, "azur": 18, "b": [8, 10, 18], "b_j": 10, "back": 2, "backbon": 8, "backend": 18, "background": 16, "bangla": 6, "bar": 15, "bar_cod": 16, "base": [4, 8, 15], "baselin": [4, 8, 18], "batch": [6, 8, 9, 15, 16, 18], "batch_siz": [6, 12, 15, 16, 17], "bblanchon": 3, "bbox": 18, "becaus": 13, "been": [2, 10, 16, 18], "befor": [6, 8, 9, 18], "begin": 10, "behavior": [1, 18], "being": [10, 18], "belong": 18, "benchmark": 18, "best": 1, "better": [11, 18], "between": [9, 10, 18], "bgr": 7, "bilinear": 9, "bin_thresh": 18, "binar": [4, 8, 18], "binari": [7, 17, 18], "bit": 17, "block": [10, 18], "block_1_1": 18, "blur": 9, "bmvc": 6, "bn": 14, "bodi": [1, 18], "bool": [6, 7, 8, 9, 10], "boolean": [8, 18], "both": [4, 6, 9, 16, 18], "bottom": [8, 18], "bound": [6, 7, 8, 9, 10, 15, 16, 18], "box": [6, 7, 8, 9, 10, 15, 16, 18], "box_thresh": 18, "bright": 9, "browser": [2, 4], "build": [2, 3, 17], "built": 2, "byte": [7, 18], "c": [3, 7, 10], "c_j": 10, "cach": [2, 6, 13], "cache_sampl": 6, "call": 17, "callabl": [6, 9], "can": [2, 3, 12, 13, 14, 15, 16, 18], "capabl": [2, 11, 18], "case": [6, 10], "cf": 18, "cfg": 18, "challeng": 6, "challenge2_test_task12_imag": 6, "challenge2_test_task1_gt": 6, "challenge2_training_task12_imag": 6, "challenge2_training_task1_gt": 6, "chang": [13, 18], "channel": [1, 2, 7, 9], "channel_prior": 3, "channelshuffl": 9, "charact": [4, 6, 7, 10, 16, 18], "charactergener": [6, 16], "characterist": 1, "charg": 18, "charset": 18, "chart": 7, "check": [2, 14, 18], "checkpoint": 8, "chip": 3, "ci": 2, "clarifi": 1, "clariti": 1, "class": [1, 6, 7, 9, 10, 18], "class_nam": 12, "classif": [16, 18], "classmethod": 7, "clear": 2, "clone": 3, "close": 2, "co": 14, "code": [4, 7, 15], "codecov": 2, "colab": 11, "collate_fn": 6, "collect": [7, 15], "color": 9, "colorinvers": 9, "column": 7, "com": [1, 3, 7, 8, 14], "combin": 18, "command": [2, 15], "comment": 1, "commit": 1, "common": [1, 9, 10, 17], "commun": 1, "compar": 4, "comparison": [10, 18], "competit": 6, "compil": [11, 18], "complaint": 1, "complementari": 10, "complet": 2, "compon": 18, "compos": [6, 18], "comprehens": 18, "comput": [6, 10, 17, 18], "conf_threshold": 15, "confid": [7, 18], "config": [3, 8], "configur": 8, "confus": 10, "consecut": [9, 18], "consequ": 1, "consid": [1, 2, 6, 7, 10, 18], "consist": 18, "consolid": [4, 6], "constant": 9, "construct": 1, "contact": 1, "contain": [5, 6, 11, 16, 18], "content": [6, 7, 18], "context": 8, "contib": 3, "continu": 1, "contrast": 9, "contrast_factor": 9, "contrib": [3, 15], "contribut": 1, "contributor": 2, "convers": 7, "convert": [7, 9], "convolut": 8, "coordin": [7, 18], "cord": [4, 6, 16, 18], "core": [10, 18], "corner": 18, "correct": 9, "correspond": [3, 7, 9, 18], "could": [1, 15], "counterpart": 10, "cover": 2, "coverag": 2, "cpu": [4, 12, 17], "creat": 14, "crnn": [4, 8, 14], "crnn_mobilenet_v3_larg": [8, 14, 18], "crnn_mobilenet_v3_smal": [8, 17, 18], "crnn_vgg16_bn": [8, 12, 14, 18], "crop": [7, 8, 9, 12, 16, 18], "crop_orient": [7, 18], "crop_orientation_predictor": [8, 12], "crop_param": 12, "cuda": 17, "currenc": 6, "current": [2, 12, 18], "custom": [14, 15, 17, 18], "custom_crop_orientation_model": 12, "custom_page_orientation_model": 12, "customhook": 18, "cvit": 4, "czczup": 8, "czech": 6, "d": [6, 16], "danish": 6, "data": [4, 6, 7, 9, 10, 12, 14], "dataload": 16, "dataset": [8, 12, 18], "dataset_info": 6, "date": [12, 18], "db": 14, "db_mobilenet_v3_larg": [8, 14, 18], "db_resnet34": 18, "db_resnet50": [8, 12, 14, 18], "dbnet": [4, 8], "deal": [11, 18], "decis": 1, "decod": 7, "decode_img_as_tensor": 7, "dedic": 17, "deem": 1, "deep": [8, 18], "def": 18, "default": [3, 7, 12, 13, 18], "defer": 16, "defin": [10, 17], "degre": [7, 9, 18], "degress": 7, "delet": 2, "delimit": 18, "delta": 9, "demo": [2, 4], "demonstr": 1, "depend": [2, 3, 4, 18], "deploi": 2, "deploy": 4, "derogatori": 1, "describ": 8, "descript": 11, "design": 9, "desir": 7, "det_arch": [8, 12, 14, 17], "det_b": 18, "det_model": [12, 14, 17], "det_param": 12, "det_predictor": [12, 18], "detail": [12, 18], "detect": [6, 7, 10, 11, 12, 15], "detect_languag": 8, "detect_orient": [8, 12, 18], "detection_predictor": [8, 18], "detection_task": [6, 16], "detectiondataset": [6, 16], "detectionmetr": 10, "detectionpredictor": [8, 12], "detector": [4, 8, 15], "deterior": 8, "determin": 1, "dev": [2, 13], "develop": 3, "deviat": 9, "devic": 17, "dict": [7, 10, 18], "dictionari": [7, 10], "differ": 1, "differenti": [4, 8], "digit": [4, 6, 16], "dimens": [7, 10, 18], "dimension": 9, "direct": 6, "directli": [14, 18], "directori": [2, 13], "disabl": [1, 13, 18], "disable_crop_orient": 18, "disable_page_orient": 18, "disclaim": 18, "discuss": 2, "disparag": 1, "displai": [7, 10], "display_artefact": 10, "distribut": 9, "div": 18, "divers": 1, "divid": 7, "do": [2, 3, 8], "doc": [2, 7, 15, 17, 18], "docartefact": [6, 16], "docstr": 2, "doctr": [3, 12, 13, 14, 15, 16, 17, 18], "doctr_cache_dir": 13, "doctr_multiprocessing_dis": 13, "document": [6, 8, 10, 11, 12, 15, 16, 17, 18], "documentbuild": 18, "documentfil": [7, 12, 14, 15, 17], "doesn": 17, "don": [12, 18], "done": 9, "download": [6, 16], "downsiz": 8, "draw": 9, "drop": 6, "drop_last": 6, "dtype": [7, 8, 9, 10, 17], "dual": [4, 6], "dummi": 14, "dummy_img": 18, "dummy_input": 17, "dure": 1, "dutch": 6, "dynam": [6, 15], "dynamic_seq_length": 6, "e": [1, 2, 3, 7, 8], "each": [4, 6, 7, 8, 9, 10, 16, 18], "eas": 2, "easi": [4, 10, 14, 17], "easili": [7, 10, 12, 14, 16, 18], "econom": 1, "edit": 1, "educ": 1, "effect": 18, "effici": [2, 4, 6, 8], "either": [10, 18], "element": [6, 7, 8, 18], "els": [2, 15], "email": 1, "empathi": 1, "en": 18, "enabl": [6, 7], "enclos": 7, "encod": [4, 6, 7, 8, 18], "encode_sequ": 6, "encount": 2, "encrypt": 7, "end": [4, 6, 8, 10], "english": [6, 16], "enough": [2, 18], "ensur": 2, "entri": 6, "environ": [1, 13], "eo": 6, "equiv": 18, "estim": 8, "etc": [7, 15], "ethnic": 1, "evalu": [16, 18], "event": 1, "everyon": 1, "everyth": [2, 18], "exact": [10, 18], "exampl": [1, 2, 4, 6, 8, 14, 18], "exchang": 17, "execut": 18, "exist": 14, "expand": 9, "expect": [7, 9, 10], "experi": 1, "explan": [1, 18], "explicit": 1, "exploit": [4, 8], "export": [7, 8, 10, 11, 15, 18], "export_as_straight_box": [8, 18], "export_as_xml": 18, "export_model_to_onnx": 17, "express": [1, 9], "extens": 7, "extern": [1, 16], "extract": [4, 6], "extractor": 8, "f_": 10, "f_a": 10, "factor": 9, "fair": 1, "fairli": 1, "fals": [6, 7, 8, 9, 10, 12, 18], "faq": 1, "fascan": 14, "fast": [4, 6, 8], "fast_bas": [8, 18], "fast_smal": [8, 18], "fast_tini": [8, 18], "faster": [4, 8, 17], "fasterrcnn_mobilenet_v3_large_fpn": 8, "favorit": 18, "featur": [3, 8, 10, 11, 12, 15], "feedback": 1, "feel": [2, 14], "felix92": 14, "few": [17, 18], "figsiz": 10, "figur": [10, 15], "file": [2, 6], "final": 8, "find": [2, 16], "finnish": 6, "first": [2, 6], "firsthand": 6, "fit": [8, 18], "flag": 18, "flip": 9, "float": [7, 9, 10, 17], "float32": [7, 8, 9, 17], "fn": 9, "focu": 14, "focus": [1, 6], "folder": 6, "follow": [1, 2, 3, 6, 9, 10, 12, 13, 14, 15, 18], "font": 6, "font_famili": 6, "foral": 10, "forc": 2, "forg": 3, "form": [4, 6, 18], "format": [7, 10, 12, 16, 17, 18], "forpost": [4, 6], "forum": 2, "fp16": 17, "frac": 10, "framework": [3, 14, 16, 18], "free": [1, 2, 14], "french": [6, 12, 14, 18], "friendli": 4, "from": [1, 4, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18], "from_hub": [8, 14], "from_imag": [7, 14, 15, 17], "from_pdf": 7, "from_url": 7, "full": [6, 10, 18], "function": [6, 9, 10, 15], "funsd": [4, 6, 16, 18], "further": 16, "futur": 6, "g": [7, 8], "g_": 10, "g_x": 10, "gamma": 9, "gaussian": 9, "gaussianblur": 9, "gaussiannois": 9, "gen": 18, "gender": 1, "gener": [2, 4, 7, 8], "generic_cyrillic_lett": 6, "geometri": [4, 7, 18], "geq": 10, "german": [6, 12, 14], "get": [17, 18], "git": 14, "github": [2, 3, 8, 14], "give": [1, 15], "given": [6, 7, 9, 10, 18], "global": 8, "go": 18, "good": 17, "googl": 2, "googlevis": 4, "gpu": [4, 15, 17], "gracefulli": 1, "graph": [4, 6, 7], "grayscal": 9, "ground": 10, "groung": 10, "group": [4, 18], "gt": 10, "gt_box": 10, "gt_label": 10, "guid": 2, "guidanc": 16, "gvision": 18, "h": [7, 8, 9], "h_": 10, "ha": [2, 6, 10, 16], "handl": [11, 16, 18], "handwrit": 6, "handwritten": 16, "harass": 1, "hardwar": 18, "harm": 1, "hat": 10, "have": [1, 2, 10, 12, 14, 16, 17, 18], "head": [8, 18], "healthi": 1, "hebrew": 6, "height": [7, 9], "hello": [10, 18], "help": 17, "here": [5, 9, 11, 15, 16, 18], "hf": 8, "hf_hub_download": 8, "high": 7, "higher": [3, 6, 18], "hindi": 6, "hindi_digit": 6, "hocr": 18, "hook": 18, "horizont": [7, 9, 18], "hous": 6, "how": [2, 11, 12, 14, 16], "howev": 16, "hsv": 9, "html": [1, 2, 3, 7, 18], "http": [1, 3, 6, 7, 8, 14, 18], "hub": 8, "hue": 9, "huggingfac": 8, "hw": 6, "i": [1, 2, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17], "i7": 18, "ic03": [4, 6, 16], "ic13": [4, 6, 16], "icdar": [4, 6], "icdar2019": 6, "id": 18, "ident": 1, "identifi": 4, "iiit": [4, 6], "iiit5k": [6, 16], "iiithw": [4, 6, 16], "imag": [4, 6, 7, 8, 9, 10, 14, 15, 16, 18], "imagenet": 8, "imageri": 1, "images_90k_norm": 6, "img": [6, 9, 16, 17], "img_cont": 7, "img_fold": [6, 16], "img_path": 7, "img_transform": 6, "imgur5k": [4, 6, 16], "imgur5k_annot": 6, "imlist": 6, "impact": 1, "implement": [6, 7, 8, 9, 10, 18], "import": [6, 7, 8, 9, 10, 12, 14, 15, 16, 17, 18], "improv": 8, "inappropri": 1, "incid": 1, "includ": [1, 6, 16, 17], "inclus": 1, "increas": 9, "independ": 9, "index": [2, 7], "indic": 10, "individu": 1, "infer": [4, 8, 9, 15, 18], "inform": [1, 2, 4, 6, 16], "input": [2, 7, 8, 9, 17, 18], "input_crop": 8, "input_pag": [8, 10, 18], "input_shap": 17, "input_tensor": 8, "inspir": [1, 9], "instal": [14, 15, 17], "instanc": [1, 18], "instanti": [8, 18], "instead": [6, 7, 8], "insult": 1, "int": [6, 7, 9], "int64": 10, "integ": 10, "integr": [4, 14, 16], "intel": 18, "interact": [1, 7, 10], "interfac": [14, 17], "interoper": 17, "interpol": 9, "interpret": [6, 7], "intersect": 10, "invert": 9, "investig": 1, "invis": 1, "involv": [1, 18], "io": [12, 14, 15, 17], "iou": 10, "iou_thresh": 10, "iou_threshold": 15, "irregular": [4, 8, 16], "isn": 6, "issu": [1, 2, 14], "italian": 6, "iter": [6, 9, 16, 18], "its": [7, 8, 9, 10, 16, 18], "itself": [8, 14], "j": 10, "job": 2, "join": 2, "jpeg": 9, "jpegqual": 9, "jpg": [6, 7, 14, 17], "json": [6, 16, 18], "json_output": 18, "jump": 2, "just": 1, "kei": [4, 6], "kera": [8, 17], "kernel": [4, 8, 9], "kernel_shap": 9, "keywoard": 8, "keyword": [6, 7, 8, 10], "kie": [8, 12], "kie_predictor": [8, 12], "kiepredictor": 8, "kind": 1, "know": [2, 17], "kwarg": [6, 7, 8, 10], "l": 10, "l_j": 10, "label": [6, 10, 15, 16], "label_fil": [6, 16], "label_fold": 6, "label_path": [6, 16], "labels_path": [6, 16], "ladder": 1, "lambda": 9, "lambdatransform": 9, "lang": 18, "languag": [1, 4, 6, 7, 8, 14, 18], "larg": [8, 14], "largest": 10, "last": [3, 6], "latenc": 8, "later": 2, "latest": 18, "latin": 6, "layer": 17, "layout": 18, "lead": 1, "leader": 1, "learn": [1, 4, 8, 17, 18], "least": 3, "left": [10, 18], "legacy_french": 6, "length": [6, 18], "less": [17, 18], "level": [1, 6, 10, 18], "leverag": 11, "lf": 14, "librari": [2, 3, 11, 12], "light": 4, "lightweight": 17, "like": 1, "limits_": 10, "line": [4, 8, 10, 18], "line_1_1": 18, "link": 12, "linknet": [4, 8], "linknet_resnet18": [8, 12, 17, 18], "linknet_resnet34": [8, 17, 18], "linknet_resnet50": [8, 18], "list": [6, 7, 9, 10, 14], "ll": 10, "load": [4, 6, 8, 15, 17], "load_state_dict": 12, "load_weight": 12, "loc_pr": 18, "local": [2, 4, 6, 8, 10, 16, 18], "localis": 6, "localizationconfus": 10, "locat": [2, 7, 18], "login": 8, "login_to_hub": [8, 14], "logo": [7, 15, 16], "love": 14, "lower": [9, 10, 18], "m": [2, 10, 18], "m1": 3, "macbook": 3, "machin": 17, "made": 4, "magc_resnet31": 8, "mai": [1, 2], "mail": 1, "main": 11, "maintain": 4, "mainten": 2, "make": [1, 2, 10, 12, 13, 14, 17, 18], "mani": [16, 18], "manipul": 18, "map": [6, 8], "map_loc": 12, "master": [4, 8, 18], "match": [10, 18], "mathcal": 10, "matplotlib": [7, 10], "max": [6, 9, 10], "max_angl": 9, "max_area": 9, "max_char": [6, 16], "max_delta": 9, "max_gain": 9, "max_gamma": 9, "max_qual": 9, "max_ratio": 9, "maximum": [6, 9], "maxval": [8, 9], "mbox": 10, "mean": [9, 10, 12], "meaniou": 10, "meant": [7, 17], "measur": 18, "media": 1, "median": 8, "meet": 12, "member": 1, "memori": [13, 17], "mention": 18, "merg": 6, "messag": 2, "meta": 18, "metadata": 17, "metal": 3, "method": [7, 9, 18], "metric": [10, 18], "middl": 18, "might": [17, 18], "min": 9, "min_area": 9, "min_char": [6, 16], "min_gain": 9, "min_gamma": 9, "min_qual": 9, "min_ratio": 9, "min_val": 9, "minde": [1, 3, 4, 8], "minim": [2, 4], "minimalist": [4, 8], "minimum": [3, 6, 9, 10, 18], "minval": 9, "miss": 3, "mistak": 1, "mixed_float16": 17, "mixed_precis": 17, "mjsynth": [4, 6, 16], "mnt": 6, "mobilenet": [8, 14], "mobilenet_v3_larg": 8, "mobilenet_v3_large_r": 8, "mobilenet_v3_smal": [8, 12], "mobilenet_v3_small_crop_orient": [8, 12], "mobilenet_v3_small_page_orient": [8, 12], "mobilenet_v3_small_r": 8, "mobilenetv3": 8, "modal": [4, 6], "mode": 3, "model": [6, 10, 13, 15, 16], "model_nam": [8, 14, 17], "model_path": [15, 17], "moder": 1, "modif": 2, "modifi": [8, 13, 18], "modul": [3, 7, 8, 9, 10, 18], "more": [2, 16, 18], "most": 18, "mozilla": 1, "multi": [4, 8], "multilingu": [6, 14], "multipl": [6, 7, 9, 18], "multipli": 9, "multiprocess": 13, "my": 8, "my_awesome_model": 14, "my_hook": 18, "n": [6, 10], "name": [6, 8, 17, 18], "nation": 1, "natur": [1, 4, 6], "ndarrai": [6, 7, 9, 10], "necessari": [3, 12, 13], "need": [2, 3, 6, 10, 12, 13, 14, 15, 18], "neg": 9, "nest": 18, "network": [4, 6, 8, 17], "neural": [4, 6, 8, 17], "new": [2, 10], "next": [6, 16], "nois": 9, "noisi": [4, 6], "non": [4, 6, 7, 8, 9, 10], "none": [6, 7, 8, 9, 10, 18], "normal": [8, 9], "norwegian": 6, "note": [0, 2, 6, 8, 12, 14, 15, 17], "now": 2, "np": [8, 9, 10, 18], "num_output_channel": 9, "num_sampl": [6, 16], "number": [6, 9, 10, 18], "numpi": [7, 8, 10, 18], "o": 3, "obb": 15, "obj_detect": 14, "object": [6, 7, 10, 15, 18], "objectness_scor": [7, 18], "oblig": 1, "obtain": 18, "occupi": 17, "ocr": [4, 6, 8, 10, 14], "ocr_carea": 18, "ocr_db_crnn": 10, "ocr_lin": 18, "ocr_pag": 18, "ocr_par": 18, "ocr_predictor": [8, 12, 14, 17, 18], "ocrdataset": [6, 16], "ocrmetr": 10, "ocrpredictor": [8, 12], "ocrx_word": 18, "offens": 1, "offici": [1, 8], "offlin": 1, "offset": 9, "onc": 18, "one": [2, 6, 8, 9, 12, 14, 18], "oneof": 9, "ones": [6, 10], "onli": [2, 8, 9, 10, 12, 14, 16, 17, 18], "onlin": 1, "onnx": 15, "onnxruntim": [15, 17], "onnxtr": 17, "opac": 9, "opacity_rang": 9, "open": [1, 2, 14, 17], "opinion": 1, "optic": [4, 18], "optim": [4, 18], "option": [6, 8, 12], "order": [2, 6, 7, 9], "org": [1, 6, 8, 18], "organ": 7, "orient": [1, 7, 8, 11, 15, 18], "orientationpredictor": 8, "other": [1, 2], "otherwis": [1, 7, 10], "our": [2, 8, 18], "out": [2, 8, 9, 10, 18], "outpout": 18, "output": [7, 9, 17], "output_s": [7, 9], "outsid": 13, "over": [6, 10, 18], "overal": [1, 8], "overlai": 7, "overview": 15, "overwrit": 12, "overwritten": 14, "own": 4, "p": [9, 18], "packag": [2, 4, 10, 13, 15, 16, 17], "pad": [6, 8, 9, 18], "page": [3, 6, 8, 10, 12, 18], "page1": 7, "page2": 7, "page_1": 18, "page_idx": [7, 18], "page_orientation_predictor": [8, 12], "page_param": 12, "pair": 10, "paper": 8, "par_1_1": 18, "paragraph": 18, "paragraph_break": 18, "param": [9, 18], "paramet": [4, 7, 8, 17], "pars": [4, 6], "parseq": [4, 8, 14, 17, 18], "part": [6, 9, 18], "parti": 3, "partial": 18, "particip": 1, "pass": [6, 7, 8, 12, 18], "password": 7, "patch": [8, 10], "path": [6, 7, 15, 16, 17], "path_to_checkpoint": 12, "path_to_custom_model": 17, "path_to_pt": 12, "pattern": 1, "pdf": [7, 8, 11], "pdfpage": 7, "peopl": 1, "per": [9, 18], "perform": [4, 7, 8, 9, 10, 13, 17, 18], "period": 1, "permiss": 1, "permut": [4, 8], "persian_lett": 6, "person": [1, 16], "phase": 18, "photo": 16, "physic": [1, 7], "pick": 9, "pictur": 7, "pip": [2, 3, 15, 17], "pipelin": 18, "pixel": [7, 9, 18], "pleas": 2, "plot": 10, "plt": 10, "plug": 14, "plugin": 3, "png": 7, "point": 17, "polici": 13, "polish": 6, "polit": 1, "polygon": [6, 10, 18], "pool": 8, "portugues": 6, "posit": [1, 10], "possibl": [2, 10, 14, 18], "post": [1, 18], "postprocessor": 18, "potenti": 8, "power": 4, "ppageno": 18, "pre": [2, 8, 17], "precis": [10, 18], "pred": 10, "pred_box": 10, "pred_label": 10, "predefin": 16, "predict": [7, 8, 10, 18], "predictor": [4, 7, 8, 11, 12, 14, 17], "prefer": 16, "preinstal": 3, "preprocessor": [12, 18], "prerequisit": 14, "present": 11, "preserv": [8, 9, 18], "preserve_aspect_ratio": [7, 8, 9, 12, 18], "pretrain": [4, 8, 10, 12, 17, 18], "pretrained_backbon": [8, 12], "print": 18, "prior": 6, "privaci": 1, "privat": 1, "probabl": 9, "problem": 2, "procedur": 9, "process": [2, 4, 7, 12, 18], "processor": 18, "produc": [11, 18], "product": 17, "profession": 1, "project": [2, 16], "promptli": 1, "proper": 2, "properli": 6, "provid": [1, 2, 4, 14, 15, 16, 18], "public": [1, 4], "publicli": 18, "publish": 1, "pull": 14, "punctuat": 6, "pure": 6, "purpos": 2, "push_to_hf_hub": [8, 14], "py": 14, "pypdfium2": [3, 7], "pyplot": [7, 10], "python": [2, 15], "python3": 14, "pytorch": [3, 4, 8, 9, 12, 14, 17, 18], "q": 2, "qr": [7, 15], "qr_code": 16, "qualiti": 9, "question": 1, "quickli": 4, "quicktour": 11, "r": 18, "race": 1, "ramdisk": 6, "rand": [8, 9, 10, 17, 18], "random": [8, 9, 10, 18], "randomappli": 9, "randombright": 9, "randomcontrast": 9, "randomcrop": 9, "randomgamma": 9, "randomhorizontalflip": 9, "randomhu": 9, "randomjpegqu": 9, "randomli": 9, "randomres": 9, "randomrot": 9, "randomsatur": 9, "randomshadow": 9, "rang": 9, "rassi": 14, "ratio": [8, 9, 18], "raw": [7, 10], "re": 17, "read": [4, 6, 8], "read_html": 7, "read_img_as_numpi": 7, "read_img_as_tensor": 7, "read_pdf": 7, "readi": 17, "real": [4, 8, 9], "reason": [1, 4, 6], "rebuild": 2, "rebuilt": 2, "recal": [10, 18], "receipt": [4, 6, 18], "reco_arch": [8, 12, 14, 17], "reco_b": 18, "reco_model": [12, 14, 17], "reco_param": 12, "reco_predictor": 12, "recogn": 18, "recognit": [6, 10, 11, 12], "recognition_predictor": [8, 18], "recognition_task": [6, 16], "recognitiondataset": [6, 16], "recognitionpredictor": [8, 12], "rectangular": 8, "reduc": [3, 9], "refer": [2, 3, 12, 14, 15, 16, 18], "regardless": 1, "region": 18, "regroup": 10, "regular": 16, "reject": 1, "rel": [7, 9, 10, 18], "relat": 7, "releas": [0, 3], "relev": 15, "religion": 1, "remov": 1, "render": [7, 18], "repo": 8, "repo_id": [8, 14], "report": 1, "repositori": [6, 8, 14], "repres": [1, 17, 18], "represent": [4, 8], "request": [1, 14], "requir": [3, 9, 17], "research": 4, "residu": 8, "resiz": [9, 18], "resnet": 8, "resnet18": [8, 14], "resnet31": 8, "resnet34": 8, "resnet50": [8, 14], "resolv": 7, "resolve_block": 18, "resolve_lin": 18, "resourc": 16, "respect": 1, "rest": [2, 9, 10], "restrict": 13, "result": [2, 6, 7, 11, 14, 17, 18], "return": 18, "reusabl": 18, "review": 1, "rgb": [7, 9], "rgb_mode": 7, "rgb_output": 7, "right": [1, 8, 10], "robust": [4, 6], "root": 6, "rotat": [6, 7, 8, 9, 10, 11, 12, 16, 18], "run": [2, 3, 8], "same": [2, 7, 10, 16, 17, 18], "sampl": [6, 16, 18], "sample_transform": 6, "sar": [4, 8], "sar_resnet31": [8, 18], "satur": 9, "save": [8, 16], "scale": [7, 8, 9, 10], "scale_rang": 9, "scan": [4, 6], "scene": [4, 6, 8], "score": [7, 10], "script": [2, 16], "seamless": 4, "seamlessli": [4, 18], "search": 8, "searchabl": 11, "sec": 18, "second": 18, "section": [12, 14, 15, 17, 18], "secur": [1, 13], "see": [1, 2], "seen": 18, "segment": [4, 8, 18], "self": 18, "semant": [4, 8], "send": 18, "sens": 10, "sensit": 16, "separ": 18, "sequenc": [4, 6, 7, 8, 10, 18], "sequenti": [9, 18], "seri": 1, "seriou": 1, "set": [1, 3, 6, 8, 10, 13, 15, 18], "set_global_polici": 17, "sever": [7, 9, 18], "sex": 1, "sexual": 1, "shade": 9, "shape": [4, 7, 8, 9, 10, 18], "share": [13, 16], "shift": 9, "shm": 13, "should": [2, 6, 7, 9, 10], "show": [4, 7, 8, 10, 12, 14, 15], "showcas": [2, 11], "shuffl": [6, 9], "side": 10, "signatur": 7, "signific": 16, "simpl": [4, 8, 17], "simpler": 8, "sinc": [6, 16], "singl": [1, 2, 4, 6], "single_img_doc": 17, "size": [1, 6, 7, 9, 15, 18], "skew": 18, "slack": 2, "slightli": 8, "small": [2, 8, 18], "smallest": 7, "snapshot_download": 8, "snippet": 18, "so": [2, 3, 6, 8, 14, 16], "social": 1, "socio": 1, "some": [3, 11, 14, 16], "someth": 2, "somewher": 2, "sort": 1, "sourc": [6, 7, 8, 9, 10, 14], "space": [1, 18], "span": 18, "spanish": 6, "spatial": [4, 6, 7], "specif": [2, 3, 10, 12, 16, 18], "specifi": [1, 6, 7], "speed": [4, 8, 18], "sphinx": 2, "sroie": [4, 6, 16], "stabl": 3, "stackoverflow": 2, "stage": 4, "standalon": 11, "standard": 9, "start": 6, "state": [4, 10, 15], "static": 10, "statu": 1, "std": [9, 12], "step": 13, "still": 18, "str": [6, 7, 8, 9, 10], "straight": [6, 8, 16, 18], "straighten": 18, "straighten_pag": [8, 12, 18], "straigten_pag": 12, "stream": 7, "street": [4, 6], "strict": 3, "strictli": 10, "string": [6, 7, 10, 18], "strive": 3, "strong": [4, 8], "structur": [17, 18], "subset": [6, 18], "suggest": [2, 14], "sum": 10, "summari": 10, "support": [3, 12, 15, 17, 18], "sustain": 1, "svhn": [4, 6, 16], "svt": [6, 16], "swedish": 6, "symmetr": [8, 9, 18], "symmetric_pad": [8, 9, 18], "synthet": 4, "synthtext": [4, 6, 16], "system": 18, "t": [2, 6, 12, 17, 18], "tabl": [14, 15, 16], "take": [1, 6, 18], "target": [6, 7, 9, 10, 16], "target_s": 6, "task": [4, 6, 8, 14, 16, 18], "task2": 6, "team": 3, "techminde": 3, "templat": [2, 4], "tensor": [6, 7, 9, 18], "tensorflow": [3, 4, 7, 8, 9, 12, 14, 17, 18], "tensorspec": 17, "term": 1, "test": [6, 16], "test_set": 6, "text": [6, 7, 8, 10, 16], "text_output": 18, "textmatch": 10, "textnet": 8, "textnet_bas": 8, "textnet_smal": 8, "textnet_tini": 8, "textract": [4, 18], "textstylebrush": [4, 6], "textual": [4, 6, 7, 8, 18], "tf": [3, 7, 8, 9, 14, 17], "than": [2, 10, 14], "thank": 2, "thei": [1, 10], "them": [6, 18], "thi": [1, 2, 3, 5, 6, 9, 10, 12, 13, 14, 16, 17, 18], "thing": [17, 18], "third": 3, "those": [1, 7, 18], "threaten": 1, "threshold": 18, "through": [1, 9, 15, 16], "tilman": 14, "time": [1, 4, 8, 10, 16], "tini": 8, "titl": [7, 18], "tm": 18, "tmp": 13, "togeth": [2, 7], "tograi": 9, "tool": 16, "top": [10, 17, 18], "topic": 2, "torch": [3, 9, 12, 14, 17], "torchvis": 9, "total": 12, "toward": [1, 3], "train": [2, 6, 8, 9, 14, 15, 16, 17, 18], "train_it": [6, 16], "train_load": [6, 16], "train_pytorch": 14, "train_set": [6, 16], "train_tensorflow": 14, "trainabl": [4, 8], "tranform": 9, "transcrib": 18, "transfer": [4, 6], "transfo": 9, "transform": [4, 6, 8], "translat": 1, "troll": 1, "true": [6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18], "truth": 10, "tune": 17, "tupl": [6, 7, 9, 10], "two": [7, 13], "txt": 6, "type": [7, 10, 14, 17, 18], "typic": 18, "u": [1, 2], "ucsd": 6, "udac": 2, "uint8": [7, 8, 10, 18], "ukrainian": 6, "unaccept": 1, "underli": [16, 18], "underneath": 7, "understand": [4, 6, 18], "uniform": [8, 9], "uniformli": 9, "uninterrupt": [7, 18], "union": 10, "unittest": 2, "unlock": 7, "unoffici": 8, "unprofession": 1, "unsolicit": 1, "unsupervis": 4, "unwelcom": 1, "up": [8, 18], "updat": 10, "upgrad": 2, "upper": [6, 9], "uppercas": 16, "url": 7, "us": [1, 2, 3, 6, 8, 10, 11, 12, 13, 14, 15, 18], "usabl": 18, "usag": [13, 17], "use_polygon": [6, 10, 16], "useabl": 18, "user": [4, 7, 11], "utf": 18, "util": 17, "v1": 14, "v3": [8, 14, 18], "valid": 16, "valu": [2, 7, 9, 18], "valuabl": 4, "variabl": 13, "varieti": 6, "veri": 8, "version": [1, 2, 3, 17, 18], "vgg": 8, "vgg16": 14, "vgg16_bn_r": 8, "via": 1, "vietnames": 6, "view": [4, 6], "viewpoint": 1, "violat": 1, "visibl": 1, "vision": [4, 6, 8], "visiondataset": 6, "visiontransform": 8, "visual": [3, 4, 15], "visualize_pag": 10, "vit_": 8, "vit_b": 8, "vitstr": [4, 8, 17], "vitstr_bas": [8, 18], "vitstr_smal": [8, 12, 17, 18], "viz": 3, "vocab": [12, 14, 16, 17, 18], "vocabulari": [6, 12, 14], "w": [7, 8, 9, 10], "w3": 18, "wa": 1, "wai": [1, 4, 16], "want": [2, 17, 18], "warmup": 18, "wasn": 2, "we": [1, 2, 3, 4, 7, 9, 12, 14, 16, 17, 18], "weasyprint": 7, "web": [2, 7], "websit": 6, "welcom": 1, "well": [1, 17], "were": [1, 7, 18], "what": 1, "when": [1, 2, 8], "whenev": 2, "where": [2, 7, 9, 10], "whether": [2, 6, 7, 9, 10, 16, 18], "which": [1, 8, 13, 15, 16, 18], "whichev": 3, "while": [9, 18], "why": 1, "width": [7, 9], "wiki": 1, "wildreceipt": [4, 6, 16], "window": [8, 10], "wish": 2, "within": 1, "without": [1, 6, 8], "wonder": 2, "word": [4, 6, 8, 10, 18], "word_1_1": 18, "word_1_2": 18, "word_1_3": 18, "wordgener": [6, 16], "words_onli": 10, "work": [12, 13, 18], "workflow": 2, "worklow": 2, "world": [10, 18], "worth": 8, "wrap": 18, "wrapper": [6, 9], "write": 13, "written": [1, 7], "www": [1, 7, 18], "x": [7, 9, 10], "x_ascend": 18, "x_descend": 18, "x_i": 10, "x_size": 18, "x_wconf": 18, "xhtml": 18, "xmax": 7, "xmin": 7, "xml": 18, "xml_bytes_str": 18, "xml_element": 18, "xml_output": 18, "xmln": 18, "y": 10, "y_i": 10, "y_j": 10, "yet": 15, "ymax": 7, "ymin": 7, "yolov8": 15, "you": [2, 3, 6, 7, 8, 12, 13, 14, 15, 16, 17, 18], "your": [2, 4, 7, 10, 18], "yoursit": 7, "zero": [9, 10], "zoo": 12, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 6, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 6, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 6, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 6, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 6, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 6, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 6, "\u00e4\u00f6\u00e4\u00f6": 6, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 6, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 6, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 6, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 6, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 6, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 6, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0905": 6, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 6, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 6, "\u0950": 6, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 6, "\u09bd": 6, "\u09ce": 6, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 6}, "titles": ["Changelog", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 2, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 1], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 1], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 1], "31": 0, "4": [0, 1], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 18, "approach": 18, "architectur": 18, "arg": [6, 7, 8, 9, 10], "artefact": 7, "artefactdetect": 15, "attribut": 1, "avail": [15, 16, 18], "aw": 13, "ban": 1, "block": 7, "bug": 2, "changelog": 0, "choos": [16, 18], "classif": [8, 12, 14], "code": [1, 2], "codebas": 2, "commit": 2, "commun": 14, "compos": 9, "conda": 3, "conduct": 1, "connect": 2, "continu": 2, "contrib": 5, "contribut": [2, 5, 15], "contributor": 1, "convent": 14, "correct": 1, "coven": 1, "custom": [6, 12], "data": 16, "dataload": 6, "dataset": [4, 6, 16], "detect": [4, 8, 14, 16, 18], "develop": 2, "do": 18, "doctr": [2, 4, 5, 6, 7, 8, 9, 10, 11], "document": [2, 4, 7], "end": 18, "enforc": 1, "evalu": 10, "export": 17, "factori": 8, "featur": [2, 4], "feedback": 2, "file": 7, "from": 14, "gener": [6, 16], "git": 3, "guidelin": 1, "half": 17, "hub": 14, "huggingfac": 14, "i": 18, "infer": 17, "instal": [2, 3], "integr": [2, 15], "io": 7, "lambda": 13, "let": 2, "line": 7, "linux": 3, "load": [12, 14, 16], "loader": 6, "main": 4, "mode": 2, "model": [4, 8, 12, 14, 17, 18], "modifi": 2, "modul": [5, 15], "name": 14, "notebook": 11, "object": 16, "ocr": [16, 18], "onli": 3, "onnx": 17, "optim": 17, "option": 18, "orient": 12, "our": 1, "output": 18, "own": [12, 16], "packag": 3, "page": 7, "perman": 1, "pipelin": 15, "pledg": 1, "precis": 17, "predictor": 18, "prepar": 17, "prerequisit": 3, "pretrain": 14, "push": 14, "python": 3, "qualiti": 2, "question": 2, "read": 7, "readi": 16, "recognit": [4, 8, 14, 16, 18], "report": 2, "request": 2, "respons": 1, "return": [6, 7, 8, 10], "right": 18, "scope": 1, "share": 14, "should": 18, "stage": 18, "standard": 1, "structur": [2, 7], "style": 2, "support": [4, 5, 6, 9], "synthet": [6, 16], "task": 10, "temporari": 1, "test": 2, "text": [4, 18], "train": 12, "transform": 9, "two": 18, "unit": 2, "us": [16, 17], "util": 10, "v0": 0, "verif": 2, "via": 3, "visual": 10, "vocab": 6, "warn": 1, "what": 18, "word": 7, "your": [12, 14, 15, 16, 17], "zoo": [4, 8]}})
\ No newline at end of file
diff --git a/using_doctr/custom_models_training.html b/using_doctr/custom_models_training.html
index 580b4368b7..e664c6a950 100644
--- a/using_doctr/custom_models_training.html
+++ b/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -615,7 +615,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/using_doctr/running_on_aws.html b/using_doctr/running_on_aws.html
index ddb0c3c80f..81c38b49f5 100644
--- a/using_doctr/running_on_aws.html
+++ b/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -358,7 +358,7 @@ AWS Lambda
-
+
diff --git a/using_doctr/sharing_models.html b/using_doctr/sharing_models.html
index 07a3b2f2a3..4f5d1d68a5 100644
--- a/using_doctr/sharing_models.html
+++ b/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -540,7 +540,7 @@ Recognition
-
+
diff --git a/using_doctr/using_contrib_modules.html b/using_doctr/using_contrib_modules.html
index b4a10925e6..cf282ff3a4 100644
--- a/using_doctr/using_contrib_modules.html
+++ b/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -411,7 +411,7 @@ ArtefactDetection
-
+
diff --git a/using_doctr/using_datasets.html b/using_doctr/using_datasets.html
index 4a52df36ba..e30b6d6459 100644
--- a/using_doctr/using_datasets.html
+++ b/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -638,7 +638,7 @@ Data Loading
-
+
diff --git a/using_doctr/using_model_export.html b/using_doctr/using_model_export.html
index 2b30ee63a1..ad9d09ed4c 100644
--- a/using_doctr/using_model_export.html
+++ b/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -463,7 +463,7 @@ Using your ONNX exported model
-
+
diff --git a/using_doctr/using_models.html b/using_doctr/using_models.html
index 13cb06116b..5c80dbf62d 100644
--- a/using_doctr/using_models.html
+++ b/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1249,7 +1249,7 @@ Advanced options
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/cord.html b/v0.1.0/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.0/_modules/doctr/datasets/cord.html
+++ b/v0.1.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/detection.html b/v0.1.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.0/_modules/doctr/datasets/detection.html
+++ b/v0.1.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/doc_artefacts.html b/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/funsd.html b/v0.1.0/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.0/_modules/doctr/datasets/funsd.html
+++ b/v0.1.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ic03.html b/v0.1.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.0/_modules/doctr/datasets/ic03.html
+++ b/v0.1.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ic13.html b/v0.1.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.0/_modules/doctr/datasets/ic13.html
+++ b/v0.1.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/iiit5k.html b/v0.1.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/iiithws.html b/v0.1.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/imgur5k.html b/v0.1.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/loader.html b/v0.1.0/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.0/_modules/doctr/datasets/loader.html
+++ b/v0.1.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/mjsynth.html b/v0.1.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ocr.html b/v0.1.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.0/_modules/doctr/datasets/ocr.html
+++ b/v0.1.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/recognition.html b/v0.1.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.0/_modules/doctr/datasets/recognition.html
+++ b/v0.1.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/sroie.html b/v0.1.0/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.0/_modules/doctr/datasets/sroie.html
+++ b/v0.1.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/svhn.html b/v0.1.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.0/_modules/doctr/datasets/svhn.html
+++ b/v0.1.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/svt.html b/v0.1.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.0/_modules/doctr/datasets/svt.html
+++ b/v0.1.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/synthtext.html b/v0.1.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/utils.html b/v0.1.0/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.0/_modules/doctr/datasets/utils.html
+++ b/v0.1.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/wildreceipt.html b/v0.1.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.0/_modules/doctr/io/elements.html b/v0.1.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.0/_modules/doctr/io/elements.html
+++ b/v0.1.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.0/_modules/doctr/io/html.html b/v0.1.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.0/_modules/doctr/io/html.html
+++ b/v0.1.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.0/_modules/doctr/io/image/base.html b/v0.1.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.0/_modules/doctr/io/image/base.html
+++ b/v0.1.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.0/_modules/doctr/io/image/tensorflow.html b/v0.1.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/io/pdf.html b/v0.1.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.0/_modules/doctr/io/pdf.html
+++ b/v0.1.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.0/_modules/doctr/io/reader.html b/v0.1.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.0/_modules/doctr/io/reader.html
+++ b/v0.1.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/zoo.html b/v0.1.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/zoo.html b/v0.1.0/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/models/factory/hub.html b/v0.1.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.0/_modules/doctr/models/factory/hub.html
+++ b/v0.1.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/zoo.html b/v0.1.0/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/models/zoo.html b/v0.1.0/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.0/_modules/doctr/models/zoo.html
+++ b/v0.1.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/transforms/modules/base.html b/v0.1.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/utils/metrics.html b/v0.1.0/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.0/_modules/doctr/utils/metrics.html
+++ b/v0.1.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.0/_modules/doctr/utils/visualization.html b/v0.1.0/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.0/_modules/doctr/utils/visualization.html
+++ b/v0.1.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.0/_modules/index.html b/v0.1.0/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.0/_modules/index.html
+++ b/v0.1.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.0/_sources/getting_started/installing.rst.txt b/v0.1.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.0/_sources/getting_started/installing.rst.txt
+++ b/v0.1.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.0/_static/basic.css b/v0.1.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.0/_static/basic.css
+++ b/v0.1.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.0/_static/doctools.js b/v0.1.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.0/_static/doctools.js
+++ b/v0.1.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.0/_static/language_data.js b/v0.1.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.0/_static/language_data.js
+++ b/v0.1.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.0/_static/searchtools.js b/v0.1.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.0/_static/searchtools.js
+++ b/v0.1.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.0/changelog.html b/v0.1.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.0/changelog.html
+++ b/v0.1.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.0/community/resources.html b/v0.1.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.0/community/resources.html
+++ b/v0.1.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.0/contributing/code_of_conduct.html b/v0.1.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.0/contributing/code_of_conduct.html
+++ b/v0.1.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.0/contributing/contributing.html b/v0.1.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.0/contributing/contributing.html
+++ b/v0.1.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.0/genindex.html b/v0.1.0/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.0/genindex.html
+++ b/v0.1.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.0/getting_started/installing.html b/v0.1.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.0/getting_started/installing.html
+++ b/v0.1.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.0/index.html b/v0.1.0/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.0/index.html
+++ b/v0.1.0/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.0/modules/contrib.html b/v0.1.0/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.0/modules/contrib.html
+++ b/v0.1.0/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.0/modules/datasets.html b/v0.1.0/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.0/modules/datasets.html
+++ b/v0.1.0/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.0/modules/io.html b/v0.1.0/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.0/modules/io.html
+++ b/v0.1.0/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.0/modules/models.html b/v0.1.0/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.0/modules/models.html
+++ b/v0.1.0/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.0/modules/transforms.html b/v0.1.0/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.0/modules/transforms.html
+++ b/v0.1.0/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.0/modules/utils.html b/v0.1.0/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.0/modules/utils.html
+++ b/v0.1.0/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.0/notebooks.html b/v0.1.0/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.0/notebooks.html
+++ b/v0.1.0/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.0/search.html b/v0.1.0/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.0/search.html
+++ b/v0.1.0/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.0/searchindex.js b/v0.1.0/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.0/searchindex.js
+++ b/v0.1.0/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.0/using_doctr/custom_models_training.html b/v0.1.0/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.0/using_doctr/custom_models_training.html
+++ b/v0.1.0/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.0/using_doctr/running_on_aws.html b/v0.1.0/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.0/using_doctr/running_on_aws.html
+++ b/v0.1.0/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.0/using_doctr/sharing_models.html b/v0.1.0/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.0/using_doctr/sharing_models.html
+++ b/v0.1.0/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.0/using_doctr/using_contrib_modules.html b/v0.1.0/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.0/using_doctr/using_contrib_modules.html
+++ b/v0.1.0/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.0/using_doctr/using_datasets.html b/v0.1.0/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.0/using_doctr/using_datasets.html
+++ b/v0.1.0/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.0/using_doctr/using_model_export.html b/v0.1.0/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.0/using_doctr/using_model_export.html
+++ b/v0.1.0/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.0/using_doctr/using_models.html b/v0.1.0/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.0/using_doctr/using_models.html
+++ b/v0.1.0/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/cord.html b/v0.1.1/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.1/_modules/doctr/datasets/cord.html
+++ b/v0.1.1/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/detection.html b/v0.1.1/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.1/_modules/doctr/datasets/detection.html
+++ b/v0.1.1/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/funsd.html b/v0.1.1/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.1/_modules/doctr/datasets/funsd.html
+++ b/v0.1.1/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic03.html b/v0.1.1/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.1/_modules/doctr/datasets/ic03.html
+++ b/v0.1.1/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic13.html b/v0.1.1/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.1/_modules/doctr/datasets/ic13.html
+++ b/v0.1.1/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiit5k.html b/v0.1.1/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.1/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.1/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiithws.html b/v0.1.1/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.1/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.1/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/imgur5k.html b/v0.1.1/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.1/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.1/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/loader.html b/v0.1.1/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.1/_modules/doctr/datasets/loader.html
+++ b/v0.1.1/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/mjsynth.html b/v0.1.1/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.1/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.1/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ocr.html b/v0.1.1/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.1/_modules/doctr/datasets/ocr.html
+++ b/v0.1.1/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/recognition.html b/v0.1.1/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.1/_modules/doctr/datasets/recognition.html
+++ b/v0.1.1/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/sroie.html b/v0.1.1/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.1/_modules/doctr/datasets/sroie.html
+++ b/v0.1.1/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svhn.html b/v0.1.1/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.1/_modules/doctr/datasets/svhn.html
+++ b/v0.1.1/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svt.html b/v0.1.1/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.1/_modules/doctr/datasets/svt.html
+++ b/v0.1.1/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/synthtext.html b/v0.1.1/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.1/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.1/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/utils.html b/v0.1.1/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.1/_modules/doctr/datasets/utils.html
+++ b/v0.1.1/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/wildreceipt.html b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.1/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.1/_modules/doctr/io/elements.html b/v0.1.1/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.1/_modules/doctr/io/elements.html
+++ b/v0.1.1/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.1/_modules/doctr/io/html.html b/v0.1.1/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.1/_modules/doctr/io/html.html
+++ b/v0.1.1/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/base.html b/v0.1.1/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.1/_modules/doctr/io/image/base.html
+++ b/v0.1.1/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/tensorflow.html b/v0.1.1/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.1/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.1/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/io/pdf.html b/v0.1.1/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.1/_modules/doctr/io/pdf.html
+++ b/v0.1.1/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.1/_modules/doctr/io/reader.html b/v0.1.1/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.1/_modules/doctr/io/reader.html
+++ b/v0.1.1/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/zoo.html b/v0.1.1/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.1/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.1/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/zoo.html b/v0.1.1/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.1/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.1/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/factory/hub.html b/v0.1.1/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.1/_modules/doctr/models/factory/hub.html
+++ b/v0.1.1/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/zoo.html b/v0.1.1/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.1/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.1/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/zoo.html b/v0.1.1/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.1/_modules/doctr/models/zoo.html
+++ b/v0.1.1/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/base.html b/v0.1.1/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/utils/metrics.html b/v0.1.1/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.1/_modules/doctr/utils/metrics.html
+++ b/v0.1.1/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.1/_modules/doctr/utils/visualization.html b/v0.1.1/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.1/_modules/doctr/utils/visualization.html
+++ b/v0.1.1/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.1/_modules/index.html b/v0.1.1/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.1/_modules/index.html
+++ b/v0.1.1/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.1/_sources/getting_started/installing.rst.txt b/v0.1.1/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.1/_sources/getting_started/installing.rst.txt
+++ b/v0.1.1/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.1/_static/basic.css b/v0.1.1/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.1/_static/basic.css
+++ b/v0.1.1/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.1/_static/doctools.js b/v0.1.1/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.1/_static/doctools.js
+++ b/v0.1.1/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.1/_static/language_data.js b/v0.1.1/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.1/_static/language_data.js
+++ b/v0.1.1/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.1/_static/searchtools.js b/v0.1.1/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.1/_static/searchtools.js
+++ b/v0.1.1/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.1/changelog.html b/v0.1.1/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.1/changelog.html
+++ b/v0.1.1/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.1/community/resources.html b/v0.1.1/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.1/community/resources.html
+++ b/v0.1.1/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.1/contributing/code_of_conduct.html b/v0.1.1/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.1/contributing/code_of_conduct.html
+++ b/v0.1.1/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.1/contributing/contributing.html b/v0.1.1/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.1/contributing/contributing.html
+++ b/v0.1.1/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.1/genindex.html b/v0.1.1/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.1/genindex.html
+++ b/v0.1.1/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.1/getting_started/installing.html b/v0.1.1/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.1/getting_started/installing.html
+++ b/v0.1.1/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.1/index.html b/v0.1.1/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.1/index.html
+++ b/v0.1.1/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.1/modules/contrib.html b/v0.1.1/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.1/modules/contrib.html
+++ b/v0.1.1/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.1/modules/datasets.html b/v0.1.1/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.1/modules/datasets.html
+++ b/v0.1.1/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.1/modules/io.html b/v0.1.1/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.1/modules/io.html
+++ b/v0.1.1/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.1/modules/models.html b/v0.1.1/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.1/modules/models.html
+++ b/v0.1.1/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.1/modules/transforms.html b/v0.1.1/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.1/modules/transforms.html
+++ b/v0.1.1/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
docTR Notebooks
-
+
diff --git a/search.html b/search.html
index f0693e2c97..0e0da5efb3 100644
--- a/search.html
+++ b/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -336,7 +336,7 @@
-
+
diff --git a/searchindex.js b/searchindex.js
index 8598997441..df18967072 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[1, "correction"]], "2. Warning": [[1, "warning"]], "3. Temporary Ban": [[1, "temporary-ban"]], "4. Permanent Ban": [[1, "permanent-ban"]], "AWS Lambda": [[13, null]], "Advanced options": [[18, "advanced-options"]], "Args:": [[6, "args"], [6, "id4"], [6, "id7"], [6, "id10"], [6, "id13"], [6, "id16"], [6, "id19"], [6, "id22"], [6, "id25"], [6, "id29"], [6, "id32"], [6, "id37"], [6, "id40"], [6, "id46"], [6, "id49"], [6, "id50"], [6, "id51"], [6, "id54"], [6, "id57"], [6, "id60"], [6, "id61"], [7, "args"], [7, "id2"], [7, "id3"], [7, "id4"], [7, "id5"], [7, "id6"], [7, "id7"], [7, "id10"], [7, "id12"], [7, "id14"], [7, "id16"], [7, "id20"], [7, "id24"], [7, "id28"], [8, "args"], [8, "id3"], [8, "id8"], [8, "id13"], [8, "id17"], [8, "id21"], [8, "id26"], [8, "id31"], [8, "id36"], [8, "id41"], [8, "id46"], [8, "id50"], [8, "id54"], [8, "id59"], [8, "id63"], [8, "id68"], [8, "id73"], [8, "id77"], [8, "id81"], [8, "id85"], [8, "id90"], [8, "id95"], [8, "id99"], [8, "id104"], [8, "id109"], [8, "id114"], [8, "id119"], [8, "id123"], [8, "id127"], [8, "id132"], [8, "id137"], [8, "id142"], [8, "id146"], [8, "id150"], [8, "id155"], [8, "id159"], [8, "id163"], [8, "id167"], [8, "id169"], [8, "id171"], [8, "id173"], [9, "args"], [9, "id1"], [9, "id2"], [9, "id3"], [9, "id4"], [9, "id5"], [9, "id6"], [9, "id7"], [9, "id8"], [9, "id9"], [9, "id10"], [9, "id11"], [9, "id12"], [9, "id13"], [9, "id14"], [9, "id15"], [9, "id16"], [9, "id17"], [9, "id18"], [9, "id19"], [10, "args"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"]], "Artefact": [[7, "artefact"]], "ArtefactDetection": [[15, "artefactdetection"]], "Attribution": [[1, "attribution"]], "Available Datasets": [[16, "available-datasets"]], "Available architectures": [[18, "available-architectures"], [18, "id1"], [18, "id2"]], "Available contribution modules": [[15, "available-contribution-modules"]], "Block": [[7, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[16, null]], "Choosing the right model": [[18, null]], "Classification": [[14, "classification"]], "Code quality": [[2, "code-quality"]], "Code style verification": [[2, "code-style-verification"]], "Codebase structure": [[2, "codebase-structure"]], "Commits": [[2, "commits"]], "Composing transformations": [[9, "composing-transformations"]], "Continuous Integration": [[2, "continuous-integration"]], "Contributing to docTR": [[2, null]], "Contributor Covenant Code of Conduct": [[1, null]], "Custom dataset loader": [[6, "custom-dataset-loader"]], "Custom orientation classification models": [[12, "custom-orientation-classification-models"]], "Data Loading": [[16, "data-loading"]], "Dataloader": [[6, "dataloader"]], "Detection": [[14, "detection"], [16, "detection"]], "Detection predictors": [[18, "detection-predictors"]], "Developer mode installation": [[2, "developer-mode-installation"]], "Developing docTR": [[2, "developing-doctr"]], "Document": [[7, "document"]], "Document structure": [[7, "document-structure"]], "End-to-End OCR": [[18, "end-to-end-ocr"]], "Enforcement": [[1, "enforcement"]], "Enforcement Guidelines": [[1, "enforcement-guidelines"]], "Enforcement Responsibilities": [[1, "enforcement-responsibilities"]], "Export to ONNX": [[17, "export-to-onnx"]], "Feature requests & bug report": [[2, "feature-requests-bug-report"]], "Feedback": [[2, "feedback"]], "File reading": [[7, "file-reading"]], "Half-precision": [[17, "half-precision"]], "Installation": [[3, null]], "Integrate contributions into your pipeline": [[15, null]], "Let\u2019s connect": [[2, "let-s-connect"]], "Line": [[7, "line"]], "Loading from Huggingface Hub": [[14, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[12, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[12, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[4, "main-features"]], "Model optimization": [[17, "model-optimization"]], "Model zoo": [[4, "model-zoo"]], "Modifying the documentation": [[2, "modifying-the-documentation"]], "Naming conventions": [[14, "naming-conventions"]], "OCR": [[16, "ocr"]], "Object Detection": [[16, "object-detection"]], "Our Pledge": [[1, "our-pledge"]], "Our Standards": [[1, "our-standards"]], "Page": [[7, "page"]], "Preparing your model for inference": [[17, null]], "Prerequisites": [[3, "prerequisites"]], "Pretrained community models": [[14, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[14, "pushing-to-the-huggingface-hub"]], "Questions": [[2, "questions"]], "Recognition": [[14, "recognition"], [16, "recognition"]], "Recognition predictors": [[18, "recognition-predictors"]], "Returns:": [[6, "returns"], [7, "returns"], [7, "id11"], [7, "id13"], [7, "id15"], [7, "id19"], [7, "id23"], [7, "id27"], [7, "id31"], [8, "returns"], [8, "id6"], [8, "id11"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id29"], [8, "id34"], [8, "id39"], [8, "id44"], [8, "id49"], [8, "id53"], [8, "id57"], [8, "id62"], [8, "id66"], [8, "id71"], [8, "id76"], [8, "id80"], [8, "id84"], [8, "id88"], [8, "id93"], [8, "id98"], [8, "id102"], [8, "id107"], [8, "id112"], [8, "id117"], [8, "id122"], [8, "id126"], [8, "id130"], [8, "id135"], [8, "id140"], [8, "id145"], [8, "id149"], [8, "id153"], [8, "id158"], [8, "id162"], [8, "id166"], [8, "id168"], [8, "id170"], [8, "id172"], [10, "returns"]], "Scope": [[1, "scope"]], "Share your model with the community": [[14, null]], "Supported Vocabs": [[6, "supported-vocabs"]], "Supported contribution modules": [[5, "supported-contribution-modules"]], "Supported datasets": [[4, "supported-datasets"]], "Supported transformations": [[9, "supported-transformations"]], "Synthetic dataset generator": [[6, "synthetic-dataset-generator"], [16, "synthetic-dataset-generator"]], "Task evaluation": [[10, "task-evaluation"]], "Text Detection": [[18, "text-detection"]], "Text Recognition": [[18, "text-recognition"]], "Text detection models": [[4, "text-detection-models"]], "Text recognition models": [[4, "text-recognition-models"]], "Train your own model": [[12, null]], "Two-stage approaches": [[18, "two-stage-approaches"]], "Unit tests": [[2, "unit-tests"]], "Use your own datasets": [[16, "use-your-own-datasets"]], "Using your ONNX exported model": [[17, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[3, "via-conda-only-for-linux"]], "Via Git": [[3, "via-git"]], "Via Python Package": [[3, "via-python-package"]], "Visualization": [[10, "visualization"]], "What should I do with the output?": [[18, "what-should-i-do-with-the-output"]], "Word": [[7, "word"]], "docTR Notebooks": [[11, null]], "docTR Vocabs": [[6, "id62"]], "docTR: Document Text Recognition": [[4, null]], "doctr.contrib": [[5, null]], "doctr.datasets": [[6, null], [6, "datasets"]], "doctr.io": [[7, null]], "doctr.models": [[8, null]], "doctr.models.classification": [[8, "doctr-models-classification"]], "doctr.models.detection": [[8, "doctr-models-detection"]], "doctr.models.factory": [[8, "doctr-models-factory"]], "doctr.models.recognition": [[8, "doctr-models-recognition"]], "doctr.models.zoo": [[8, "doctr-models-zoo"]], "doctr.transforms": [[9, null]], "doctr.utils": [[10, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[7, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[7, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[9, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[6, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[9, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[9, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[6, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[6, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[8, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[6, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[6, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[7, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[7, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[6, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[6, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[9, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[9, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[6, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[6, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[6, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[6, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[6, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[8, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[9, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[7, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[6, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[9, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[8, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[6, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[9, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[7, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[9, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[9, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[9, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[9, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[9, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[9, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[9, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[9, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[9, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[9, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[9, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[9, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[7, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[7, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[7, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[6, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[9, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[7, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[7, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[6, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[6, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[6, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[6, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[9, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[10, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[6, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[7, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[6, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[6, 0, 1, "", "CORD"], [6, 0, 1, "", "CharacterGenerator"], [6, 0, 1, "", "DetectionDataset"], [6, 0, 1, "", "DocArtefacts"], [6, 0, 1, "", "FUNSD"], [6, 0, 1, "", "IC03"], [6, 0, 1, "", "IC13"], [6, 0, 1, "", "IIIT5K"], [6, 0, 1, "", "IIITHWS"], [6, 0, 1, "", "IMGUR5K"], [6, 0, 1, "", "MJSynth"], [6, 0, 1, "", "OCRDataset"], [6, 0, 1, "", "RecognitionDataset"], [6, 0, 1, "", "SROIE"], [6, 0, 1, "", "SVHN"], [6, 0, 1, "", "SVT"], [6, 0, 1, "", "SynthText"], [6, 0, 1, "", "WILDRECEIPT"], [6, 0, 1, "", "WordGenerator"], [6, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[6, 0, 1, "", "DataLoader"]], "doctr.io": [[7, 0, 1, "", "Artefact"], [7, 0, 1, "", "Block"], [7, 0, 1, "", "Document"], [7, 0, 1, "", "DocumentFile"], [7, 0, 1, "", "Line"], [7, 0, 1, "", "Page"], [7, 0, 1, "", "Word"], [7, 1, 1, "", "decode_img_as_tensor"], [7, 1, 1, "", "read_html"], [7, 1, 1, "", "read_img_as_numpy"], [7, 1, 1, "", "read_img_as_tensor"], [7, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[7, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[7, 2, 1, "", "from_images"], [7, 2, 1, "", "from_pdf"], [7, 2, 1, "", "from_url"]], "doctr.io.Page": [[7, 2, 1, "", "show"]], "doctr.models": [[8, 1, 1, "", "kie_predictor"], [8, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[8, 1, 1, "", "crop_orientation_predictor"], [8, 1, 1, "", "magc_resnet31"], [8, 1, 1, "", "mobilenet_v3_large"], [8, 1, 1, "", "mobilenet_v3_large_r"], [8, 1, 1, "", "mobilenet_v3_small"], [8, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [8, 1, 1, "", "mobilenet_v3_small_page_orientation"], [8, 1, 1, "", "mobilenet_v3_small_r"], [8, 1, 1, "", "page_orientation_predictor"], [8, 1, 1, "", "resnet18"], [8, 1, 1, "", "resnet31"], [8, 1, 1, "", "resnet34"], [8, 1, 1, "", "resnet50"], [8, 1, 1, "", "textnet_base"], [8, 1, 1, "", "textnet_small"], [8, 1, 1, "", "textnet_tiny"], [8, 1, 1, "", "vgg16_bn_r"], [8, 1, 1, "", "vit_b"], [8, 1, 1, "", "vit_s"]], "doctr.models.detection": [[8, 1, 1, "", "db_mobilenet_v3_large"], [8, 1, 1, "", "db_resnet50"], [8, 1, 1, "", "detection_predictor"], [8, 1, 1, "", "fast_base"], [8, 1, 1, "", "fast_small"], [8, 1, 1, "", "fast_tiny"], [8, 1, 1, "", "linknet_resnet18"], [8, 1, 1, "", "linknet_resnet34"], [8, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[8, 1, 1, "", "from_hub"], [8, 1, 1, "", "login_to_hub"], [8, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[8, 1, 1, "", "crnn_mobilenet_v3_large"], [8, 1, 1, "", "crnn_mobilenet_v3_small"], [8, 1, 1, "", "crnn_vgg16_bn"], [8, 1, 1, "", "master"], [8, 1, 1, "", "parseq"], [8, 1, 1, "", "recognition_predictor"], [8, 1, 1, "", "sar_resnet31"], [8, 1, 1, "", "vitstr_base"], [8, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[9, 0, 1, "", "ChannelShuffle"], [9, 0, 1, "", "ColorInversion"], [9, 0, 1, "", "Compose"], [9, 0, 1, "", "GaussianBlur"], [9, 0, 1, "", "GaussianNoise"], [9, 0, 1, "", "LambdaTransformation"], [9, 0, 1, "", "Normalize"], [9, 0, 1, "", "OneOf"], [9, 0, 1, "", "RandomApply"], [9, 0, 1, "", "RandomBrightness"], [9, 0, 1, "", "RandomContrast"], [9, 0, 1, "", "RandomCrop"], [9, 0, 1, "", "RandomGamma"], [9, 0, 1, "", "RandomHorizontalFlip"], [9, 0, 1, "", "RandomHue"], [9, 0, 1, "", "RandomJpegQuality"], [9, 0, 1, "", "RandomResize"], [9, 0, 1, "", "RandomRotate"], [9, 0, 1, "", "RandomSaturation"], [9, 0, 1, "", "RandomShadow"], [9, 0, 1, "", "Resize"], [9, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[10, 0, 1, "", "DetectionMetric"], [10, 0, 1, "", "LocalizationConfusion"], [10, 0, 1, "", "OCRMetric"], [10, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.visualization": [[10, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [1, 7, 8, 10, 14, 17], "0": [1, 3, 6, 9, 10, 12, 15, 16, 18], "00": 18, "01": 18, "0123456789": 6, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "02562": 8, "03": 18, "035": 18, "0361328125": 18, "04": 18, "05": 18, "06": 18, "06640625": 18, "07": 18, "08": [9, 18], "09": 18, "0966796875": 18, "1": [6, 7, 8, 9, 10, 12, 16, 18], "10": [6, 10, 18], "100": [6, 9, 10, 16, 18], "1000": 18, "101": 6, "1024": [8, 12, 18], "104": 6, "106": 6, "108": 6, "1095": 16, "11": 18, "110": 10, "1107": 16, "114": 6, "115": 6, "1156": 16, "116": 6, "118": 6, "11800h": 18, "11th": 18, "12": 18, "120": 6, "123": 6, "126": 6, "1268": 16, "128": [8, 12, 17, 18], "13": 18, "130": 6, "13068": 16, "131": 6, "1337891": 16, "1357421875": 18, "1396484375": 18, "14": 18, "1420": 18, "14470v1": 6, "149": 16, "15": 18, "150": [10, 18], "1552": 18, "16": [8, 17, 18], "1630859375": 18, "1684": 18, "16x16": 8, "17": 18, "1778": 18, "1782": 18, "18": [8, 18], "185546875": 18, "1900": 18, "1910": 8, "19342": 16, "19370": 16, "195": 6, "19598": 16, "199": 18, "1999": 18, "2": [3, 4, 6, 7, 9, 15, 18], "20": 18, "200": 10, "2000": 16, "2003": [4, 6], "2012": 6, "2013": [4, 6], "2015": 6, "2019": 4, "207901": 16, "21": 18, "2103": 6, "2186": 16, "21888": 16, "22": 18, "224": [8, 9], "225": 9, "22672": 16, "229": [9, 16], "23": 18, "233": 16, "236": 6, "24": 18, "246": 16, "249": 16, "25": 18, "2504": 18, "255": [7, 8, 9, 10, 18], "256": 8, "257": 16, "26": 18, "26032": 16, "264": 12, "27": 18, "2700": 16, "2710": 18, "2749": 12, "28": 18, "287": 12, "29": 18, "296": 12, "299": 12, "2d": 18, "3": [3, 4, 7, 8, 9, 10, 17, 18], "30": 18, "300": 16, "3000": 16, "301": 12, "30595": 18, "30ghz": 18, "31": 8, "32": [6, 8, 9, 12, 16, 17, 18], "3232421875": 18, "33": [9, 18], "33402": 16, "33608": 16, "34": [8, 18], "340": 18, "3456": 18, "3515625": 18, "36": 18, "360": 16, "37": [6, 18], "38": 18, "39": 18, "4": [8, 9, 10, 18], "40": 18, "406": 9, "41": 18, "42": 18, "43": 18, "44": 18, "45": 18, "456": 9, "46": 18, "47": 18, "472": 16, "48": [6, 18], "485": 9, "49": 18, "49377": 16, "5": [6, 9, 10, 15, 18], "50": [8, 16, 18], "51": 18, "51171875": 18, "512": 8, "52": [6, 18], "529": 18, "53": 18, "54": 18, "540": 18, "5478515625": 18, "55": 18, "56": 18, "57": 18, "58": [6, 18], "580": 18, "5810546875": 18, "583": 18, "59": 18, "597": 18, "5k": [4, 6], "5m": 18, "6": [9, 18], "60": 9, "600": [8, 10, 18], "61": 18, "62": 18, "626": 16, "63": 18, "64": [8, 9, 18], "641": 18, "647": 16, "65": 18, "66": 18, "67": 18, "68": 18, "69": 18, "693": 12, "694": 12, "695": 12, "6m": 18, "7": 18, "70": [6, 10, 18], "707470": 16, "71": [6, 18], "7100000": 16, "7141797": 16, "7149": 16, "72": 18, "72dpi": 7, "73": 18, "73257": 16, "74": 18, "75": [9, 18], "7581382": 16, "76": 18, "77": 18, "772": 12, "772875": 16, "78": 18, "785": 12, "79": 18, "793533": 16, "796": 16, "798": 12, "7m": 18, "8": [8, 9, 18], "80": 18, "800": [8, 10, 16, 18], "81": 18, "82": 18, "83": 18, "84": 18, "849": 16, "85": 18, "8564453125": 18, "857": 18, "85875": 16, "86": 18, "8603515625": 18, "87": 18, "8707": 16, "88": 18, "89": 18, "9": [3, 9, 18], "90": 18, "90k": 6, "90kdict32px": 6, "91": 18, "914085328578949": 18, "92": 18, "93": 18, "94": [6, 18], "95": [10, 18], "9578408598899841": 18, "96": 18, "97": 18, "98": 18, "99": 18, "9949972033500671": 18, "A": [1, 2, 4, 6, 7, 8, 11, 17], "As": 2, "Be": 18, "Being": 1, "By": 13, "For": [1, 2, 3, 12, 18], "If": [2, 7, 8, 12, 18], "In": [2, 6, 16], "It": [9, 14, 15, 17], "Its": [4, 8], "No": [1, 18], "Of": 6, "Or": [15, 17], "The": [1, 2, 6, 7, 10, 13, 15, 16, 17, 18], "Then": 8, "To": [2, 3, 13, 14, 15, 17, 18], "_": [1, 6, 8], "__call__": 18, "_build": 2, "_i": 10, "ab": 6, "abc": 17, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "abdef": [6, 16], "abl": [16, 18], "about": [1, 16, 18], "abov": 18, "abstractdataset": 6, "abus": 1, "accept": 1, "access": [4, 7, 16, 18], "account": [1, 14], "accur": 18, "accuraci": 10, "achiev": 17, "act": 1, "action": 1, "activ": 4, "ad": [2, 8, 9], "adapt": 1, "add": [9, 10, 14, 18], "add_hook": 18, "add_label": 10, "addit": [2, 3, 7, 15, 18], "addition": [2, 18], "address": [1, 7], "adjust": 9, "advanc": 1, "advantag": 17, "advis": 2, "aesthet": [4, 6], "affect": 1, "after": [14, 18], "ag": 1, "again": 8, "aggreg": [10, 16], "aggress": 1, "align": [1, 7, 9], "all": [1, 2, 5, 6, 7, 9, 10, 15, 16, 18], "allow": [1, 17], "along": 18, "alreadi": [2, 17], "also": [1, 8, 14, 15, 16, 18], "alwai": 16, "an": [1, 2, 4, 6, 7, 8, 10, 15, 17, 18], "analysi": [7, 15], "ancient_greek": 6, "angl": [7, 9], "ani": [1, 6, 7, 8, 9, 10, 17, 18], "annot": 6, "anot": 16, "anoth": [8, 12, 16], "answer": 1, "anyascii": 10, "anyon": 4, "anyth": 15, "api": [2, 4], "apolog": 1, "apologi": 1, "app": 2, "appear": 1, "appli": [1, 6, 9], "applic": [4, 8], "appoint": 1, "appreci": 14, "appropri": [1, 2, 18], "ar": [1, 2, 3, 5, 6, 7, 9, 10, 11, 15, 16, 18], "arab": 6, "arabic_diacrit": 6, "arabic_lett": 6, "arabic_punctu": 6, "arbitrarili": [4, 8], "arch": [8, 14], "architectur": [4, 8, 14, 15], "area": 18, "argument": [6, 7, 8, 10, 12, 18], "around": 1, "arrai": [7, 9, 10], "art": [4, 15], "artefact": [10, 15, 18], "artefact_typ": 7, "artifici": [4, 6], "arxiv": [6, 8], "asarrai": 10, "ascii_lett": 6, "aspect": [4, 8, 9, 18], "assess": 10, "assign": 10, "associ": 7, "assum": 8, "assume_straight_pag": [8, 12, 18], "astyp": [8, 10, 18], "attack": 1, "attend": [4, 8], "attent": [1, 8], "autom": 4, "automat": 18, "autoregress": [4, 8], "avail": [1, 4, 5, 9], "averag": [9, 18], "avoid": [1, 3], "aw": [4, 18], "awar": 18, "azur": 18, "b": [8, 10, 18], "b_j": 10, "back": 2, "backbon": 8, "backend": 18, "background": 16, "bangla": 6, "bar": 15, "bar_cod": 16, "base": [4, 8, 15], "baselin": [4, 8, 18], "batch": [6, 8, 9, 15, 16, 18], "batch_siz": [6, 12, 15, 16, 17], "bblanchon": 3, "bbox": 18, "becaus": 13, "been": [2, 10, 16, 18], "befor": [6, 8, 9, 18], "begin": 10, "behavior": [1, 18], "being": [10, 18], "belong": 18, "benchmark": 18, "best": 1, "better": [11, 18], "between": [9, 10, 18], "bgr": 7, "bilinear": 9, "bin_thresh": 18, "binar": [4, 8, 18], "binari": [7, 17, 18], "bit": 17, "block": [10, 18], "block_1_1": 18, "blur": 9, "bmvc": 6, "bn": 14, "bodi": [1, 18], "bool": [6, 7, 8, 9, 10], "boolean": [8, 18], "both": [4, 6, 9, 16, 18], "bottom": [8, 18], "bound": [6, 7, 8, 9, 10, 15, 16, 18], "box": [6, 7, 8, 9, 10, 15, 16, 18], "box_thresh": 18, "bright": 9, "browser": [2, 4], "build": [2, 3, 17], "built": 2, "byte": [7, 18], "c": [3, 7, 10], "c_j": 10, "cach": [2, 6, 13], "cache_sampl": 6, "call": 17, "callabl": [6, 9], "can": [2, 3, 12, 13, 14, 15, 16, 18], "capabl": [2, 11, 18], "case": [6, 10], "cf": 18, "cfg": 18, "challeng": 6, "challenge2_test_task12_imag": 6, "challenge2_test_task1_gt": 6, "challenge2_training_task12_imag": 6, "challenge2_training_task1_gt": 6, "chang": [13, 18], "channel": [1, 2, 7, 9], "channel_prior": 3, "channelshuffl": 9, "charact": [4, 6, 7, 10, 16, 18], "charactergener": [6, 16], "characterist": 1, "charg": 18, "charset": 18, "chart": 7, "check": [2, 14, 18], "checkpoint": 8, "chip": 3, "ci": 2, "clarifi": 1, "clariti": 1, "class": [1, 6, 7, 9, 10, 18], "class_nam": 12, "classif": [16, 18], "classmethod": 7, "clear": 2, "clone": 3, "close": 2, "co": 14, "code": [4, 7, 15], "codecov": 2, "colab": 11, "collate_fn": 6, "collect": [7, 15], "color": 9, "colorinvers": 9, "column": 7, "com": [1, 3, 7, 8, 14], "combin": 18, "command": [2, 15], "comment": 1, "commit": 1, "common": [1, 9, 10, 17], "commun": 1, "compar": 4, "comparison": [10, 18], "competit": 6, "compil": [11, 18], "complaint": 1, "complementari": 10, "complet": 2, "compon": 18, "compos": [6, 18], "comprehens": 18, "comput": [6, 10, 17, 18], "conf_threshold": 15, "confid": [7, 18], "config": [3, 8], "configur": 8, "confus": 10, "consecut": [9, 18], "consequ": 1, "consid": [1, 2, 6, 7, 10, 18], "consist": 18, "consolid": [4, 6], "constant": 9, "construct": 1, "contact": 1, "contain": [5, 6, 11, 16, 18], "content": [6, 7, 18], "context": 8, "contib": 3, "continu": 1, "contrast": 9, "contrast_factor": 9, "contrib": [3, 15], "contribut": 1, "contributor": 2, "convers": 7, "convert": [7, 9], "convolut": 8, "coordin": [7, 18], "cord": [4, 6, 16, 18], "core": [10, 18], "corner": 18, "correct": 9, "correspond": [3, 7, 9, 18], "could": [1, 15], "counterpart": 10, "cover": 2, "coverag": 2, "cpu": [4, 12, 17], "creat": 14, "crnn": [4, 8, 14], "crnn_mobilenet_v3_larg": [8, 14, 18], "crnn_mobilenet_v3_smal": [8, 17, 18], "crnn_vgg16_bn": [8, 12, 14, 18], "crop": [7, 8, 9, 12, 16, 18], "crop_orient": [7, 18], "crop_orientation_predictor": [8, 12], "crop_param": 12, "cuda": 17, "currenc": 6, "current": [2, 12, 18], "custom": [14, 15, 17, 18], "custom_crop_orientation_model": 12, "custom_page_orientation_model": 12, "customhook": 18, "cvit": 4, "czczup": 8, "czech": 6, "d": [6, 16], "danish": 6, "data": [4, 6, 7, 9, 10, 12, 14], "dataload": 16, "dataset": [8, 12, 18], "dataset_info": 6, "date": [12, 18], "db": 14, "db_mobilenet_v3_larg": [8, 14, 18], "db_resnet34": 18, "db_resnet50": [8, 12, 14, 18], "dbnet": [4, 8], "deal": [11, 18], "decis": 1, "decod": 7, "decode_img_as_tensor": 7, "dedic": 17, "deem": 1, "deep": [8, 18], "def": 18, "default": [3, 7, 12, 13, 18], "defer": 16, "defin": [10, 17], "degre": [7, 9, 18], "degress": 7, "delet": 2, "delimit": 18, "delta": 9, "demo": [2, 4], "demonstr": 1, "depend": [2, 3, 4, 18], "deploi": 2, "deploy": 4, "derogatori": 1, "describ": 8, "descript": 11, "design": 9, "desir": 7, "det_arch": [8, 12, 14, 17], "det_b": 18, "det_model": [12, 14, 17], "det_param": 12, "det_predictor": [12, 18], "detail": [12, 18], "detect": [6, 7, 10, 11, 12, 15], "detect_languag": 8, "detect_orient": [8, 12, 18], "detection_predictor": [8, 18], "detection_task": [6, 16], "detectiondataset": [6, 16], "detectionmetr": 10, "detectionpredictor": [8, 12], "detector": [4, 8, 15], "deterior": 8, "determin": 1, "dev": [2, 13], "develop": 3, "deviat": 9, "devic": 17, "dict": [7, 10, 18], "dictionari": [7, 10], "differ": 1, "differenti": [4, 8], "digit": [4, 6, 16], "dimens": [7, 10, 18], "dimension": 9, "direct": 6, "directli": [14, 18], "directori": [2, 13], "disabl": [1, 13, 18], "disable_crop_orient": 18, "disable_page_orient": 18, "disclaim": 18, "discuss": 2, "disparag": 1, "displai": [7, 10], "display_artefact": 10, "distribut": 9, "div": 18, "divers": 1, "divid": 7, "do": [2, 3, 8], "doc": [2, 7, 15, 17, 18], "docartefact": [6, 16], "docstr": 2, "doctr": [3, 12, 13, 14, 15, 16, 17, 18], "doctr_cache_dir": 13, "doctr_multiprocessing_dis": 13, "document": [6, 8, 10, 11, 12, 15, 16, 17, 18], "documentbuild": 18, "documentfil": [7, 12, 14, 15, 17], "doesn": 17, "don": [12, 18], "done": 9, "download": [6, 16], "downsiz": 8, "draw": 9, "drop": 6, "drop_last": 6, "dtype": [7, 8, 9, 10, 17], "dual": [4, 6], "dummi": 14, "dummy_img": 18, "dummy_input": 17, "dure": 1, "dutch": 6, "dynam": [6, 15], "dynamic_seq_length": 6, "e": [1, 2, 3, 7, 8], "each": [4, 6, 7, 8, 9, 10, 16, 18], "eas": 2, "easi": [4, 10, 14, 17], "easili": [7, 10, 12, 14, 16, 18], "econom": 1, "edit": 1, "educ": 1, "effect": 18, "effici": [2, 4, 6, 8], "either": [10, 18], "element": [6, 7, 8, 18], "els": [2, 15], "email": 1, "empathi": 1, "en": 18, "enabl": [6, 7], "enclos": 7, "encod": [4, 6, 7, 8, 18], "encode_sequ": 6, "encount": 2, "encrypt": 7, "end": [4, 6, 8, 10], "english": [6, 16], "enough": [2, 18], "ensur": 2, "entri": 6, "environ": [1, 13], "eo": 6, "equiv": 18, "estim": 8, "etc": [7, 15], "ethnic": 1, "evalu": [16, 18], "event": 1, "everyon": 1, "everyth": [2, 18], "exact": [10, 18], "exampl": [1, 2, 4, 6, 8, 14, 18], "exchang": 17, "execut": 18, "exist": 14, "expand": 9, "expect": [7, 9, 10], "experi": 1, "explan": [1, 18], "explicit": 1, "exploit": [4, 8], "export": [7, 8, 10, 11, 15, 18], "export_as_straight_box": [8, 18], "export_as_xml": 18, "export_model_to_onnx": 17, "express": [1, 9], "extens": 7, "extern": [1, 16], "extract": [4, 6], "extractor": 8, "f_": 10, "f_a": 10, "factor": 9, "fair": 1, "fairli": 1, "fals": [6, 7, 8, 9, 10, 12, 18], "faq": 1, "fascan": 14, "fast": [4, 6, 8], "fast_bas": [8, 18], "fast_smal": [8, 18], "fast_tini": [8, 18], "faster": [4, 8, 17], "fasterrcnn_mobilenet_v3_large_fpn": 8, "favorit": 18, "featur": [3, 8, 10, 11, 12, 15], "feedback": 1, "feel": [2, 14], "felix92": 14, "few": [17, 18], "figsiz": 10, "figur": [10, 15], "file": [2, 6], "final": 8, "find": [2, 16], "finnish": 6, "first": [2, 6], "firsthand": 6, "fit": [8, 18], "flag": 18, "flip": 9, "float": [7, 9, 10, 17], "float32": [7, 8, 9, 17], "fn": 9, "focu": 14, "focus": [1, 6], "folder": 6, "follow": [1, 2, 3, 6, 9, 10, 12, 13, 14, 15, 18], "font": 6, "font_famili": 6, "foral": 10, "forc": 2, "forg": 3, "form": [4, 6, 18], "format": [7, 10, 12, 16, 17, 18], "forpost": [4, 6], "forum": 2, "fp16": 17, "frac": 10, "framework": [3, 14, 16, 18], "free": [1, 2, 14], "french": [6, 12, 14, 18], "friendli": 4, "from": [1, 4, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18], "from_hub": [8, 14], "from_imag": [7, 14, 15, 17], "from_pdf": 7, "from_url": 7, "full": [6, 10, 18], "function": [6, 9, 10, 15], "funsd": [4, 6, 16, 18], "further": 16, "futur": 6, "g": [7, 8], "g_": 10, "g_x": 10, "gamma": 9, "gaussian": 9, "gaussianblur": 9, "gaussiannois": 9, "gen": 18, "gender": 1, "gener": [2, 4, 7, 8], "generic_cyrillic_lett": 6, "geometri": [4, 7, 18], "geq": 10, "german": [6, 12, 14], "get": [17, 18], "git": 14, "github": [2, 3, 8, 14], "give": [1, 15], "given": [6, 7, 9, 10, 18], "global": 8, "go": 18, "good": 17, "googl": 2, "googlevis": 4, "gpu": [4, 15, 17], "gracefulli": 1, "graph": [4, 6, 7], "grayscal": 9, "ground": 10, "groung": 10, "group": [4, 18], "gt": 10, "gt_box": 10, "gt_label": 10, "guid": 2, "guidanc": 16, "gvision": 18, "h": [7, 8, 9], "h_": 10, "ha": [2, 6, 10, 16], "handl": [11, 16, 18], "handwrit": 6, "handwritten": 16, "harass": 1, "hardwar": 18, "harm": 1, "hat": 10, "have": [1, 2, 10, 12, 14, 16, 17, 18], "head": [8, 18], "healthi": 1, "hebrew": 6, "height": [7, 9], "hello": [10, 18], "help": 17, "here": [5, 9, 11, 15, 16, 18], "hf": 8, "hf_hub_download": 8, "high": 7, "higher": [3, 6, 18], "hindi": 6, "hindi_digit": 6, "hocr": 18, "hook": 18, "horizont": [7, 9, 18], "hous": 6, "how": [2, 11, 12, 14, 16], "howev": 16, "hsv": 9, "html": [1, 2, 3, 7, 18], "http": [1, 3, 6, 7, 8, 14, 18], "hub": 8, "hue": 9, "huggingfac": 8, "hw": 6, "i": [1, 2, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17], "i7": 18, "ic03": [4, 6, 16], "ic13": [4, 6, 16], "icdar": [4, 6], "icdar2019": 6, "id": 18, "ident": 1, "identifi": 4, "iiit": [4, 6], "iiit5k": [6, 16], "iiithw": [4, 6, 16], "imag": [4, 6, 7, 8, 9, 10, 14, 15, 16, 18], "imagenet": 8, "imageri": 1, "images_90k_norm": 6, "img": [6, 9, 16, 17], "img_cont": 7, "img_fold": [6, 16], "img_path": 7, "img_transform": 6, "imgur5k": [4, 6, 16], "imgur5k_annot": 6, "imlist": 6, "impact": 1, "implement": [6, 7, 8, 9, 10, 18], "import": [6, 7, 8, 9, 10, 12, 14, 15, 16, 17, 18], "improv": 8, "inappropri": 1, "incid": 1, "includ": [1, 6, 16, 17], "inclus": 1, "increas": 9, "independ": 9, "index": [2, 7], "indic": 10, "individu": 1, "infer": [4, 8, 9, 15, 18], "inform": [1, 2, 4, 6, 16], "input": [2, 7, 8, 9, 17, 18], "input_crop": 8, "input_pag": [8, 10, 18], "input_shap": 17, "input_tensor": 8, "inspir": [1, 9], "instal": [14, 15, 17], "instanc": [1, 18], "instanti": [8, 18], "instead": [6, 7, 8], "insult": 1, "int": [6, 7, 9], "int64": 10, "integ": 10, "integr": [4, 14, 16], "intel": 18, "interact": [1, 7, 10], "interfac": [14, 17], "interoper": 17, "interpol": 9, "interpret": [6, 7], "intersect": 10, "invert": 9, "investig": 1, "invis": 1, "involv": [1, 18], "io": [12, 14, 15, 17], "iou": 10, "iou_thresh": 10, "iou_threshold": 15, "irregular": [4, 8, 16], "isn": 6, "issu": [1, 2, 14], "italian": 6, "iter": [6, 9, 16, 18], "its": [7, 8, 9, 10, 16, 18], "itself": [8, 14], "j": 10, "job": 2, "join": 2, "jpeg": 9, "jpegqual": 9, "jpg": [6, 7, 14, 17], "json": [6, 16, 18], "json_output": 18, "jump": 2, "just": 1, "kei": [4, 6], "kera": [8, 17], "kernel": [4, 8, 9], "kernel_shap": 9, "keywoard": 8, "keyword": [6, 7, 8, 10], "kie": [8, 12], "kie_predictor": [8, 12], "kiepredictor": 8, "kind": 1, "know": [2, 17], "kwarg": [6, 7, 8, 10], "l": 10, "l_j": 10, "label": [6, 10, 15, 16], "label_fil": [6, 16], "label_fold": 6, "label_path": [6, 16], "labels_path": [6, 16], "ladder": 1, "lambda": 9, "lambdatransform": 9, "lang": 18, "languag": [1, 4, 6, 7, 8, 14, 18], "larg": [8, 14], "largest": 10, "last": [3, 6], "latenc": 8, "later": 2, "latest": 18, "latin": 6, "layer": 17, "layout": 18, "lead": 1, "leader": 1, "learn": [1, 4, 8, 17, 18], "least": 3, "left": [10, 18], "legacy_french": 6, "length": [6, 18], "less": [17, 18], "level": [1, 6, 10, 18], "leverag": 11, "lf": 14, "librari": [2, 3, 11, 12], "light": 4, "lightweight": 17, "like": 1, "limits_": 10, "line": [4, 8, 10, 18], "line_1_1": 18, "link": 12, "linknet": [4, 8], "linknet_resnet18": [8, 12, 17, 18], "linknet_resnet34": [8, 17, 18], "linknet_resnet50": [8, 18], "list": [6, 7, 9, 10, 14], "ll": 10, "load": [4, 6, 8, 15, 17], "load_state_dict": 12, "load_weight": 12, "loc_pr": 18, "local": [2, 4, 6, 8, 10, 16, 18], "localis": 6, "localizationconfus": 10, "locat": [2, 7, 18], "login": 8, "login_to_hub": [8, 14], "logo": [7, 15, 16], "love": 14, "lower": [9, 10, 18], "m": [2, 10, 18], "m1": 3, "macbook": 3, "machin": 17, "made": 4, "magc_resnet31": 8, "mai": [1, 2], "mail": 1, "main": 11, "maintain": 4, "mainten": 2, "make": [1, 2, 10, 12, 13, 14, 17, 18], "mani": [16, 18], "manipul": 18, "map": [6, 8], "map_loc": 12, "master": [4, 8, 18], "match": [10, 18], "mathcal": 10, "matplotlib": [7, 10], "max": [6, 9, 10], "max_angl": 9, "max_area": 9, "max_char": [6, 16], "max_delta": 9, "max_gain": 9, "max_gamma": 9, "max_qual": 9, "max_ratio": 9, "maximum": [6, 9], "maxval": [8, 9], "mbox": 10, "mean": [9, 10, 12], "meaniou": 10, "meant": [7, 17], "measur": 18, "media": 1, "median": 8, "meet": 12, "member": 1, "memori": [13, 17], "mention": 18, "merg": 6, "messag": 2, "meta": 18, "metadata": 17, "metal": 3, "method": [7, 9, 18], "metric": [10, 18], "middl": 18, "might": [17, 18], "min": 9, "min_area": 9, "min_char": [6, 16], "min_gain": 9, "min_gamma": 9, "min_qual": 9, "min_ratio": 9, "min_val": 9, "minde": [1, 3, 4, 8], "minim": [2, 4], "minimalist": [4, 8], "minimum": [3, 6, 9, 10, 18], "minval": 9, "miss": 3, "mistak": 1, "mixed_float16": 17, "mixed_precis": 17, "mjsynth": [4, 6, 16], "mnt": 6, "mobilenet": [8, 14], "mobilenet_v3_larg": 8, "mobilenet_v3_large_r": 8, "mobilenet_v3_smal": [8, 12], "mobilenet_v3_small_crop_orient": [8, 12], "mobilenet_v3_small_page_orient": [8, 12], "mobilenet_v3_small_r": 8, "mobilenetv3": 8, "modal": [4, 6], "mode": 3, "model": [6, 10, 13, 15, 16], "model_nam": [8, 14, 17], "model_path": [15, 17], "moder": 1, "modif": 2, "modifi": [8, 13, 18], "modul": [3, 7, 8, 9, 10, 18], "more": [2, 16, 18], "most": 18, "mozilla": 1, "multi": [4, 8], "multilingu": [6, 14], "multipl": [6, 7, 9, 18], "multipli": 9, "multiprocess": 13, "my": 8, "my_awesome_model": 14, "my_hook": 18, "n": [6, 10], "name": [6, 8, 17, 18], "nation": 1, "natur": [1, 4, 6], "ndarrai": [6, 7, 9, 10], "necessari": [3, 12, 13], "need": [2, 3, 6, 10, 12, 13, 14, 15, 18], "neg": 9, "nest": 18, "network": [4, 6, 8, 17], "neural": [4, 6, 8, 17], "new": [2, 10], "next": [6, 16], "nois": 9, "noisi": [4, 6], "non": [4, 6, 7, 8, 9, 10], "none": [6, 7, 8, 9, 10, 18], "normal": [8, 9], "norwegian": 6, "note": [0, 2, 6, 8, 12, 14, 15, 17], "now": 2, "np": [8, 9, 10, 18], "num_output_channel": 9, "num_sampl": [6, 16], "number": [6, 9, 10, 18], "numpi": [7, 8, 10, 18], "o": 3, "obb": 15, "obj_detect": 14, "object": [6, 7, 10, 15, 18], "objectness_scor": [7, 18], "oblig": 1, "obtain": 18, "occupi": 17, "ocr": [4, 6, 8, 10, 14], "ocr_carea": 18, "ocr_db_crnn": 10, "ocr_lin": 18, "ocr_pag": 18, "ocr_par": 18, "ocr_predictor": [8, 12, 14, 17, 18], "ocrdataset": [6, 16], "ocrmetr": 10, "ocrpredictor": [8, 12], "ocrx_word": 18, "offens": 1, "offici": [1, 8], "offlin": 1, "offset": 9, "onc": 18, "one": [2, 6, 8, 9, 12, 14, 18], "oneof": 9, "ones": [6, 10], "onli": [2, 8, 9, 10, 12, 14, 16, 17, 18], "onlin": 1, "onnx": 15, "onnxruntim": [15, 17], "onnxtr": 17, "opac": 9, "opacity_rang": 9, "open": [1, 2, 14, 17], "opinion": 1, "optic": [4, 18], "optim": [4, 18], "option": [6, 8, 12], "order": [2, 6, 7, 9], "org": [1, 6, 8, 18], "organ": 7, "orient": [1, 7, 8, 11, 15, 18], "orientationpredictor": 8, "other": [1, 2], "otherwis": [1, 7, 10], "our": [2, 8, 18], "out": [2, 8, 9, 10, 18], "outpout": 18, "output": [7, 9, 17], "output_s": [7, 9], "outsid": 13, "over": [6, 10, 18], "overal": [1, 8], "overlai": 7, "overview": 15, "overwrit": 12, "overwritten": 14, "own": 4, "p": [9, 18], "packag": [2, 4, 10, 13, 15, 16, 17], "pad": [6, 8, 9, 18], "page": [3, 6, 8, 10, 12, 18], "page1": 7, "page2": 7, "page_1": 18, "page_idx": [7, 18], "page_orientation_predictor": [8, 12], "page_param": 12, "pair": 10, "paper": 8, "par_1_1": 18, "paragraph": 18, "paragraph_break": 18, "param": [9, 18], "paramet": [4, 7, 8, 17], "pars": [4, 6], "parseq": [4, 8, 14, 17, 18], "part": [6, 9, 18], "parti": 3, "partial": 18, "particip": 1, "pass": [6, 7, 8, 12, 18], "password": 7, "patch": [8, 10], "path": [6, 7, 15, 16, 17], "path_to_checkpoint": 12, "path_to_custom_model": 17, "path_to_pt": 12, "pattern": 1, "pdf": [7, 8, 11], "pdfpage": 7, "peopl": 1, "per": [9, 18], "perform": [4, 7, 8, 9, 10, 13, 17, 18], "period": 1, "permiss": 1, "permut": [4, 8], "persian_lett": 6, "person": [1, 16], "phase": 18, "photo": 16, "physic": [1, 7], "pick": 9, "pictur": 7, "pip": [2, 3, 15, 17], "pipelin": 18, "pixel": [7, 9, 18], "pleas": 2, "plot": 10, "plt": 10, "plug": 14, "plugin": 3, "png": 7, "point": 17, "polici": 13, "polish": 6, "polit": 1, "polygon": [6, 10, 18], "pool": 8, "portugues": 6, "posit": [1, 10], "possibl": [2, 10, 14, 18], "post": [1, 18], "postprocessor": 18, "potenti": 8, "power": 4, "ppageno": 18, "pre": [2, 8, 17], "precis": [10, 18], "pred": 10, "pred_box": 10, "pred_label": 10, "predefin": 16, "predict": [7, 8, 10, 18], "predictor": [4, 7, 8, 11, 12, 14, 17], "prefer": 16, "preinstal": 3, "preprocessor": [12, 18], "prerequisit": 14, "present": 11, "preserv": [8, 9, 18], "preserve_aspect_ratio": [7, 8, 9, 12, 18], "pretrain": [4, 8, 10, 12, 17, 18], "pretrained_backbon": [8, 12], "print": 18, "prior": 6, "privaci": 1, "privat": 1, "probabl": 9, "problem": 2, "procedur": 9, "process": [2, 4, 7, 12, 18], "processor": 18, "produc": [11, 18], "product": 17, "profession": 1, "project": [2, 16], "promptli": 1, "proper": 2, "properli": 6, "provid": [1, 2, 4, 14, 15, 16, 18], "public": [1, 4], "publicli": 18, "publish": 1, "pull": 14, "punctuat": 6, "pure": 6, "purpos": 2, "push_to_hf_hub": [8, 14], "py": 14, "pypdfium2": [3, 7], "pyplot": [7, 10], "python": [2, 15], "python3": 14, "pytorch": [3, 4, 8, 9, 12, 14, 17, 18], "q": 2, "qr": [7, 15], "qr_code": 16, "qualiti": 9, "question": 1, "quickli": 4, "quicktour": 11, "r": 18, "race": 1, "ramdisk": 6, "rand": [8, 9, 10, 17, 18], "random": [8, 9, 10, 18], "randomappli": 9, "randombright": 9, "randomcontrast": 9, "randomcrop": 9, "randomgamma": 9, "randomhorizontalflip": 9, "randomhu": 9, "randomjpegqu": 9, "randomli": 9, "randomres": 9, "randomrot": 9, "randomsatur": 9, "randomshadow": 9, "rang": 9, "rassi": 14, "ratio": [8, 9, 18], "raw": [7, 10], "re": 17, "read": [4, 6, 8], "read_html": 7, "read_img_as_numpi": 7, "read_img_as_tensor": 7, "read_pdf": 7, "readi": 17, "real": [4, 8, 9], "reason": [1, 4, 6], "rebuild": 2, "rebuilt": 2, "recal": [10, 18], "receipt": [4, 6, 18], "reco_arch": [8, 12, 14, 17], "reco_b": 18, "reco_model": [12, 14, 17], "reco_param": 12, "reco_predictor": 12, "recogn": 18, "recognit": [6, 10, 11, 12], "recognition_predictor": [8, 18], "recognition_task": [6, 16], "recognitiondataset": [6, 16], "recognitionpredictor": [8, 12], "rectangular": 8, "reduc": [3, 9], "refer": [2, 3, 12, 14, 15, 16, 18], "regardless": 1, "region": 18, "regroup": 10, "regular": 16, "reject": 1, "rel": [7, 9, 10, 18], "relat": 7, "releas": [0, 3], "relev": 15, "religion": 1, "remov": 1, "render": [7, 18], "repo": 8, "repo_id": [8, 14], "report": 1, "repositori": [6, 8, 14], "repres": [1, 17, 18], "represent": [4, 8], "request": [1, 14], "requir": [3, 9, 17], "research": 4, "residu": 8, "resiz": [9, 18], "resnet": 8, "resnet18": [8, 14], "resnet31": 8, "resnet34": 8, "resnet50": [8, 14], "resolv": 7, "resolve_block": 18, "resolve_lin": 18, "resourc": 16, "respect": 1, "rest": [2, 9, 10], "restrict": 13, "result": [2, 6, 7, 11, 14, 17, 18], "return": 18, "reusabl": 18, "review": 1, "rgb": [7, 9], "rgb_mode": 7, "rgb_output": 7, "right": [1, 8, 10], "robust": [4, 6], "root": 6, "rotat": [6, 7, 8, 9, 10, 11, 12, 16, 18], "run": [2, 3, 8], "same": [2, 7, 10, 16, 17, 18], "sampl": [6, 16, 18], "sample_transform": 6, "sar": [4, 8], "sar_resnet31": [8, 18], "satur": 9, "save": [8, 16], "scale": [7, 8, 9, 10], "scale_rang": 9, "scan": [4, 6], "scene": [4, 6, 8], "score": [7, 10], "script": [2, 16], "seamless": 4, "seamlessli": [4, 18], "search": 8, "searchabl": 11, "sec": 18, "second": 18, "section": [12, 14, 15, 17, 18], "secur": [1, 13], "see": [1, 2], "seen": 18, "segment": [4, 8, 18], "self": 18, "semant": [4, 8], "send": 18, "sens": 10, "sensit": 16, "separ": 18, "sequenc": [4, 6, 7, 8, 10, 18], "sequenti": [9, 18], "seri": 1, "seriou": 1, "set": [1, 3, 6, 8, 10, 13, 15, 18], "set_global_polici": 17, "sever": [7, 9, 18], "sex": 1, "sexual": 1, "shade": 9, "shape": [4, 7, 8, 9, 10, 18], "share": [13, 16], "shift": 9, "shm": 13, "should": [2, 6, 7, 9, 10], "show": [4, 7, 8, 10, 12, 14, 15], "showcas": [2, 11], "shuffl": [6, 9], "side": 10, "signatur": 7, "signific": 16, "simpl": [4, 8, 17], "simpler": 8, "sinc": [6, 16], "singl": [1, 2, 4, 6], "single_img_doc": 17, "size": [1, 6, 7, 9, 15, 18], "skew": 18, "slack": 2, "slightli": 8, "small": [2, 8, 18], "smallest": 7, "snapshot_download": 8, "snippet": 18, "so": [2, 3, 6, 8, 14, 16], "social": 1, "socio": 1, "some": [3, 11, 14, 16], "someth": 2, "somewher": 2, "sort": 1, "sourc": [6, 7, 8, 9, 10, 14], "space": [1, 18], "span": 18, "spanish": 6, "spatial": [4, 6, 7], "specif": [2, 3, 10, 12, 16, 18], "specifi": [1, 6, 7], "speed": [4, 8, 18], "sphinx": 2, "sroie": [4, 6, 16], "stabl": 3, "stackoverflow": 2, "stage": 4, "standalon": 11, "standard": 9, "start": 6, "state": [4, 10, 15], "static": 10, "statu": 1, "std": [9, 12], "step": 13, "still": 18, "str": [6, 7, 8, 9, 10], "straight": [6, 8, 16, 18], "straighten": 18, "straighten_pag": [8, 12, 18], "straigten_pag": 12, "stream": 7, "street": [4, 6], "strict": 3, "strictli": 10, "string": [6, 7, 10, 18], "strive": 3, "strong": [4, 8], "structur": [17, 18], "subset": [6, 18], "suggest": [2, 14], "sum": 10, "summari": 10, "support": [3, 12, 15, 17, 18], "sustain": 1, "svhn": [4, 6, 16], "svt": [6, 16], "swedish": 6, "symmetr": [8, 9, 18], "symmetric_pad": [8, 9, 18], "synthet": 4, "synthtext": [4, 6, 16], "system": 18, "t": [2, 6, 12, 17, 18], "tabl": [14, 15, 16], "take": [1, 6, 18], "target": [6, 7, 9, 10, 16], "target_s": 6, "task": [4, 6, 8, 14, 16, 18], "task2": 6, "team": 3, "techminde": 3, "templat": [2, 4], "tensor": [6, 7, 9, 18], "tensorflow": [3, 4, 7, 8, 9, 12, 14, 17, 18], "tensorspec": 17, "term": 1, "test": [6, 16], "test_set": 6, "text": [6, 7, 8, 10, 16], "text_output": 18, "textmatch": 10, "textnet": 8, "textnet_bas": 8, "textnet_smal": 8, "textnet_tini": 8, "textract": [4, 18], "textstylebrush": [4, 6], "textual": [4, 6, 7, 8, 18], "tf": [3, 7, 8, 9, 14, 17], "than": [2, 10, 14], "thank": 2, "thei": [1, 10], "them": [6, 18], "thi": [1, 2, 3, 5, 6, 9, 10, 12, 13, 14, 16, 17, 18], "thing": [17, 18], "third": 3, "those": [1, 7, 18], "threaten": 1, "threshold": 18, "through": [1, 9, 15, 16], "tilman": 14, "time": [1, 4, 8, 10, 16], "tini": 8, "titl": [7, 18], "tm": 18, "tmp": 13, "togeth": [2, 7], "tograi": 9, "tool": 16, "top": [10, 17, 18], "topic": 2, "torch": [3, 9, 12, 14, 17], "torchvis": 9, "total": 12, "toward": [1, 3], "train": [2, 6, 8, 9, 14, 15, 16, 17, 18], "train_it": [6, 16], "train_load": [6, 16], "train_pytorch": 14, "train_set": [6, 16], "train_tensorflow": 14, "trainabl": [4, 8], "tranform": 9, "transcrib": 18, "transfer": [4, 6], "transfo": 9, "transform": [4, 6, 8], "translat": 1, "troll": 1, "true": [6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18], "truth": 10, "tune": 17, "tupl": [6, 7, 9, 10], "two": [7, 13], "txt": 6, "type": [7, 10, 14, 17, 18], "typic": 18, "u": [1, 2], "ucsd": 6, "udac": 2, "uint8": [7, 8, 10, 18], "ukrainian": 6, "unaccept": 1, "underli": [16, 18], "underneath": 7, "understand": [4, 6, 18], "uniform": [8, 9], "uniformli": 9, "uninterrupt": [7, 18], "union": 10, "unittest": 2, "unlock": 7, "unoffici": 8, "unprofession": 1, "unsolicit": 1, "unsupervis": 4, "unwelcom": 1, "up": [8, 18], "updat": 10, "upgrad": 2, "upper": [6, 9], "uppercas": 16, "url": 7, "us": [1, 2, 3, 6, 8, 10, 11, 12, 13, 14, 15, 18], "usabl": 18, "usag": [13, 17], "use_polygon": [6, 10, 16], "useabl": 18, "user": [4, 7, 11], "utf": 18, "util": 17, "v1": 14, "v3": [8, 14, 18], "valid": 16, "valu": [2, 7, 9, 18], "valuabl": 4, "variabl": 13, "varieti": 6, "veri": 8, "version": [1, 2, 3, 17, 18], "vgg": 8, "vgg16": 14, "vgg16_bn_r": 8, "via": 1, "vietnames": 6, "view": [4, 6], "viewpoint": 1, "violat": 1, "visibl": 1, "vision": [4, 6, 8], "visiondataset": 6, "visiontransform": 8, "visual": [3, 4, 15], "visualize_pag": 10, "vit_": 8, "vit_b": 8, "vitstr": [4, 8, 17], "vitstr_bas": [8, 18], "vitstr_smal": [8, 12, 17, 18], "viz": 3, "vocab": [12, 14, 16, 17, 18], "vocabulari": [6, 12, 14], "w": [7, 8, 9, 10], "w3": 18, "wa": 1, "wai": [1, 4, 16], "want": [2, 17, 18], "warmup": 18, "wasn": 2, "we": [1, 2, 3, 4, 7, 9, 12, 14, 16, 17, 18], "weasyprint": 7, "web": [2, 7], "websit": 6, "welcom": 1, "well": [1, 17], "were": [1, 7, 18], "what": 1, "when": [1, 2, 8], "whenev": 2, "where": [2, 7, 9, 10], "whether": [2, 6, 7, 9, 10, 16, 18], "which": [1, 8, 13, 15, 16, 18], "whichev": 3, "while": [9, 18], "why": 1, "width": [7, 9], "wiki": 1, "wildreceipt": [4, 6, 16], "window": [8, 10], "wish": 2, "within": 1, "without": [1, 6, 8], "wonder": 2, "word": [4, 6, 8, 10, 18], "word_1_1": 18, "word_1_2": 18, "word_1_3": 18, "wordgener": [6, 16], "words_onli": 10, "work": [12, 13, 18], "workflow": 2, "worklow": 2, "world": [10, 18], "worth": 8, "wrap": 18, "wrapper": [6, 9], "write": 13, "written": [1, 7], "www": [1, 7, 18], "x": [7, 9, 10], "x_ascend": 18, "x_descend": 18, "x_i": 10, "x_size": 18, "x_wconf": 18, "xhtml": 18, "xmax": 7, "xmin": 7, "xml": 18, "xml_bytes_str": 18, "xml_element": 18, "xml_output": 18, "xmln": 18, "y": 10, "y_i": 10, "y_j": 10, "yet": 15, "ymax": 7, "ymin": 7, "yolov8": 15, "you": [2, 3, 6, 7, 8, 12, 13, 14, 15, 16, 17, 18], "your": [2, 4, 7, 10, 18], "yoursit": 7, "zero": [9, 10], "zoo": 12, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 6, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 6, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 6, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 6, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 6, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 6, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 6, "\u00e4\u00f6\u00e4\u00f6": 6, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 6, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 6, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 6, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 6, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 6, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 6, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0905": 6, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 6, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 6, "\u0950": 6, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 6, "\u09bd": 6, "\u09ce": 6, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 6}, "titles": ["Changelog", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 2, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 1], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 1], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 1], "31": 0, "4": [0, 1], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 18, "approach": 18, "architectur": 18, "arg": [6, 7, 8, 9, 10], "artefact": 7, "artefactdetect": 15, "attribut": 1, "avail": [15, 16, 18], "aw": 13, "ban": 1, "block": 7, "bug": 2, "changelog": 0, "choos": [16, 18], "classif": [8, 12, 14], "code": [1, 2], "codebas": 2, "commit": 2, "commun": 14, "compos": 9, "conda": 3, "conduct": 1, "connect": 2, "continu": 2, "contrib": 5, "contribut": [2, 5, 15], "contributor": 1, "convent": 14, "correct": 1, "coven": 1, "custom": [6, 12], "data": 16, "dataload": 6, "dataset": [4, 6, 16], "detect": [4, 8, 14, 16, 18], "develop": 2, "do": 18, "doctr": [2, 4, 5, 6, 7, 8, 9, 10, 11], "document": [2, 4, 7], "end": 18, "enforc": 1, "evalu": 10, "export": 17, "factori": 8, "featur": [2, 4], "feedback": 2, "file": 7, "from": 14, "gener": [6, 16], "git": 3, "guidelin": 1, "half": 17, "hub": 14, "huggingfac": 14, "i": 18, "infer": 17, "instal": [2, 3], "integr": [2, 15], "io": 7, "lambda": 13, "let": 2, "line": 7, "linux": 3, "load": [12, 14, 16], "loader": 6, "main": 4, "mode": 2, "model": [4, 8, 12, 14, 17, 18], "modifi": 2, "modul": [5, 15], "name": 14, "notebook": 11, "object": 16, "ocr": [16, 18], "onli": 3, "onnx": 17, "optim": 17, "option": 18, "orient": 12, "our": 1, "output": 18, "own": [12, 16], "packag": 3, "page": 7, "perman": 1, "pipelin": 15, "pledg": 1, "precis": 17, "predictor": 18, "prepar": 17, "prerequisit": 3, "pretrain": 14, "push": 14, "python": 3, "qualiti": 2, "question": 2, "read": 7, "readi": 16, "recognit": [4, 8, 14, 16, 18], "report": 2, "request": 2, "respons": 1, "return": [6, 7, 8, 10], "right": 18, "scope": 1, "share": 14, "should": 18, "stage": 18, "standard": 1, "structur": [2, 7], "style": 2, "support": [4, 5, 6, 9], "synthet": [6, 16], "task": 10, "temporari": 1, "test": 2, "text": [4, 18], "train": 12, "transform": 9, "two": 18, "unit": 2, "us": [16, 17], "util": 10, "v0": 0, "verif": 2, "via": 3, "visual": 10, "vocab": 6, "warn": 1, "what": 18, "word": 7, "your": [12, 14, 15, 16, 17], "zoo": [4, 8]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[1, "correction"]], "2. Warning": [[1, "warning"]], "3. Temporary Ban": [[1, "temporary-ban"]], "4. Permanent Ban": [[1, "permanent-ban"]], "AWS Lambda": [[13, null]], "Advanced options": [[18, "advanced-options"]], "Args:": [[6, "args"], [6, "id4"], [6, "id7"], [6, "id10"], [6, "id13"], [6, "id16"], [6, "id19"], [6, "id22"], [6, "id25"], [6, "id29"], [6, "id32"], [6, "id37"], [6, "id40"], [6, "id46"], [6, "id49"], [6, "id50"], [6, "id51"], [6, "id54"], [6, "id57"], [6, "id60"], [6, "id61"], [7, "args"], [7, "id2"], [7, "id3"], [7, "id4"], [7, "id5"], [7, "id6"], [7, "id7"], [7, "id10"], [7, "id12"], [7, "id14"], [7, "id16"], [7, "id20"], [7, "id24"], [7, "id28"], [8, "args"], [8, "id3"], [8, "id8"], [8, "id13"], [8, "id17"], [8, "id21"], [8, "id26"], [8, "id31"], [8, "id36"], [8, "id41"], [8, "id46"], [8, "id50"], [8, "id54"], [8, "id59"], [8, "id63"], [8, "id68"], [8, "id73"], [8, "id77"], [8, "id81"], [8, "id85"], [8, "id90"], [8, "id95"], [8, "id99"], [8, "id104"], [8, "id109"], [8, "id114"], [8, "id119"], [8, "id123"], [8, "id127"], [8, "id132"], [8, "id137"], [8, "id142"], [8, "id146"], [8, "id150"], [8, "id155"], [8, "id159"], [8, "id163"], [8, "id167"], [8, "id169"], [8, "id171"], [8, "id173"], [9, "args"], [9, "id1"], [9, "id2"], [9, "id3"], [9, "id4"], [9, "id5"], [9, "id6"], [9, "id7"], [9, "id8"], [9, "id9"], [9, "id10"], [9, "id11"], [9, "id12"], [9, "id13"], [9, "id14"], [9, "id15"], [9, "id16"], [9, "id17"], [9, "id18"], [9, "id19"], [10, "args"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"]], "Artefact": [[7, "artefact"]], "ArtefactDetection": [[15, "artefactdetection"]], "Attribution": [[1, "attribution"]], "Available Datasets": [[16, "available-datasets"]], "Available architectures": [[18, "available-architectures"], [18, "id1"], [18, "id2"]], "Available contribution modules": [[15, "available-contribution-modules"]], "Block": [[7, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[16, null]], "Choosing the right model": [[18, null]], "Classification": [[14, "classification"]], "Code quality": [[2, "code-quality"]], "Code style verification": [[2, "code-style-verification"]], "Codebase structure": [[2, "codebase-structure"]], "Commits": [[2, "commits"]], "Composing transformations": [[9, "composing-transformations"]], "Continuous Integration": [[2, "continuous-integration"]], "Contributing to docTR": [[2, null]], "Contributor Covenant Code of Conduct": [[1, null]], "Custom dataset loader": [[6, "custom-dataset-loader"]], "Custom orientation classification models": [[12, "custom-orientation-classification-models"]], "Data Loading": [[16, "data-loading"]], "Dataloader": [[6, "dataloader"]], "Detection": [[14, "detection"], [16, "detection"]], "Detection predictors": [[18, "detection-predictors"]], "Developer mode installation": [[2, "developer-mode-installation"]], "Developing docTR": [[2, "developing-doctr"]], "Document": [[7, "document"]], "Document structure": [[7, "document-structure"]], "End-to-End OCR": [[18, "end-to-end-ocr"]], "Enforcement": [[1, "enforcement"]], "Enforcement Guidelines": [[1, "enforcement-guidelines"]], "Enforcement Responsibilities": [[1, "enforcement-responsibilities"]], "Export to ONNX": [[17, "export-to-onnx"]], "Feature requests & bug report": [[2, "feature-requests-bug-report"]], "Feedback": [[2, "feedback"]], "File reading": [[7, "file-reading"]], "Half-precision": [[17, "half-precision"]], "Installation": [[3, null]], "Integrate contributions into your pipeline": [[15, null]], "Let\u2019s connect": [[2, "let-s-connect"]], "Line": [[7, "line"]], "Loading from Huggingface Hub": [[14, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[12, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[12, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[4, "main-features"]], "Model optimization": [[17, "model-optimization"]], "Model zoo": [[4, "model-zoo"]], "Modifying the documentation": [[2, "modifying-the-documentation"]], "Naming conventions": [[14, "naming-conventions"]], "OCR": [[16, "ocr"]], "Object Detection": [[16, "object-detection"]], "Our Pledge": [[1, "our-pledge"]], "Our Standards": [[1, "our-standards"]], "Page": [[7, "page"]], "Preparing your model for inference": [[17, null]], "Prerequisites": [[3, "prerequisites"]], "Pretrained community models": [[14, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[14, "pushing-to-the-huggingface-hub"]], "Questions": [[2, "questions"]], "Recognition": [[14, "recognition"], [16, "recognition"]], "Recognition predictors": [[18, "recognition-predictors"]], "Returns:": [[6, "returns"], [7, "returns"], [7, "id11"], [7, "id13"], [7, "id15"], [7, "id19"], [7, "id23"], [7, "id27"], [7, "id31"], [8, "returns"], [8, "id6"], [8, "id11"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id29"], [8, "id34"], [8, "id39"], [8, "id44"], [8, "id49"], [8, "id53"], [8, "id57"], [8, "id62"], [8, "id66"], [8, "id71"], [8, "id76"], [8, "id80"], [8, "id84"], [8, "id88"], [8, "id93"], [8, "id98"], [8, "id102"], [8, "id107"], [8, "id112"], [8, "id117"], [8, "id122"], [8, "id126"], [8, "id130"], [8, "id135"], [8, "id140"], [8, "id145"], [8, "id149"], [8, "id153"], [8, "id158"], [8, "id162"], [8, "id166"], [8, "id168"], [8, "id170"], [8, "id172"], [10, "returns"]], "Scope": [[1, "scope"]], "Share your model with the community": [[14, null]], "Supported Vocabs": [[6, "supported-vocabs"]], "Supported contribution modules": [[5, "supported-contribution-modules"]], "Supported datasets": [[4, "supported-datasets"]], "Supported transformations": [[9, "supported-transformations"]], "Synthetic dataset generator": [[6, "synthetic-dataset-generator"], [16, "synthetic-dataset-generator"]], "Task evaluation": [[10, "task-evaluation"]], "Text Detection": [[18, "text-detection"]], "Text Recognition": [[18, "text-recognition"]], "Text detection models": [[4, "text-detection-models"]], "Text recognition models": [[4, "text-recognition-models"]], "Train your own model": [[12, null]], "Two-stage approaches": [[18, "two-stage-approaches"]], "Unit tests": [[2, "unit-tests"]], "Use your own datasets": [[16, "use-your-own-datasets"]], "Using your ONNX exported model": [[17, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[3, "via-conda-only-for-linux"]], "Via Git": [[3, "via-git"]], "Via Python Package": [[3, "via-python-package"]], "Visualization": [[10, "visualization"]], "What should I do with the output?": [[18, "what-should-i-do-with-the-output"]], "Word": [[7, "word"]], "docTR Notebooks": [[11, null]], "docTR Vocabs": [[6, "id62"]], "docTR: Document Text Recognition": [[4, null]], "doctr.contrib": [[5, null]], "doctr.datasets": [[6, null], [6, "datasets"]], "doctr.io": [[7, null]], "doctr.models": [[8, null]], "doctr.models.classification": [[8, "doctr-models-classification"]], "doctr.models.detection": [[8, "doctr-models-detection"]], "doctr.models.factory": [[8, "doctr-models-factory"]], "doctr.models.recognition": [[8, "doctr-models-recognition"]], "doctr.models.zoo": [[8, "doctr-models-zoo"]], "doctr.transforms": [[9, null]], "doctr.utils": [[10, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[7, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[7, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[9, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[6, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[9, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[9, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[6, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[6, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[8, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[6, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[6, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[7, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[7, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[6, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[8, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[7, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[6, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[9, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[9, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[6, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[6, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[6, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[6, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[6, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[8, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[9, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[7, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[8, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[6, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[9, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[8, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[6, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[9, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[7, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[8, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[8, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[9, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[9, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[9, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[9, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[9, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[9, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[9, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[9, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[9, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[9, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[9, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[9, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[7, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[7, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[7, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[7, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[6, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[9, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[8, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[7, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[7, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[6, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[6, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[6, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[6, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[10, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[8, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[9, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[10, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[10, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[10, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[10, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[8, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[10, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[8, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[8, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[6, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[7, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[6, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[6, 0, 1, "", "CORD"], [6, 0, 1, "", "CharacterGenerator"], [6, 0, 1, "", "DetectionDataset"], [6, 0, 1, "", "DocArtefacts"], [6, 0, 1, "", "FUNSD"], [6, 0, 1, "", "IC03"], [6, 0, 1, "", "IC13"], [6, 0, 1, "", "IIIT5K"], [6, 0, 1, "", "IIITHWS"], [6, 0, 1, "", "IMGUR5K"], [6, 0, 1, "", "MJSynth"], [6, 0, 1, "", "OCRDataset"], [6, 0, 1, "", "RecognitionDataset"], [6, 0, 1, "", "SROIE"], [6, 0, 1, "", "SVHN"], [6, 0, 1, "", "SVT"], [6, 0, 1, "", "SynthText"], [6, 0, 1, "", "WILDRECEIPT"], [6, 0, 1, "", "WordGenerator"], [6, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[6, 0, 1, "", "DataLoader"]], "doctr.io": [[7, 0, 1, "", "Artefact"], [7, 0, 1, "", "Block"], [7, 0, 1, "", "Document"], [7, 0, 1, "", "DocumentFile"], [7, 0, 1, "", "Line"], [7, 0, 1, "", "Page"], [7, 0, 1, "", "Word"], [7, 1, 1, "", "decode_img_as_tensor"], [7, 1, 1, "", "read_html"], [7, 1, 1, "", "read_img_as_numpy"], [7, 1, 1, "", "read_img_as_tensor"], [7, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[7, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[7, 2, 1, "", "from_images"], [7, 2, 1, "", "from_pdf"], [7, 2, 1, "", "from_url"]], "doctr.io.Page": [[7, 2, 1, "", "show"]], "doctr.models": [[8, 1, 1, "", "kie_predictor"], [8, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[8, 1, 1, "", "crop_orientation_predictor"], [8, 1, 1, "", "magc_resnet31"], [8, 1, 1, "", "mobilenet_v3_large"], [8, 1, 1, "", "mobilenet_v3_large_r"], [8, 1, 1, "", "mobilenet_v3_small"], [8, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [8, 1, 1, "", "mobilenet_v3_small_page_orientation"], [8, 1, 1, "", "mobilenet_v3_small_r"], [8, 1, 1, "", "page_orientation_predictor"], [8, 1, 1, "", "resnet18"], [8, 1, 1, "", "resnet31"], [8, 1, 1, "", "resnet34"], [8, 1, 1, "", "resnet50"], [8, 1, 1, "", "textnet_base"], [8, 1, 1, "", "textnet_small"], [8, 1, 1, "", "textnet_tiny"], [8, 1, 1, "", "vgg16_bn_r"], [8, 1, 1, "", "vit_b"], [8, 1, 1, "", "vit_s"]], "doctr.models.detection": [[8, 1, 1, "", "db_mobilenet_v3_large"], [8, 1, 1, "", "db_resnet50"], [8, 1, 1, "", "detection_predictor"], [8, 1, 1, "", "fast_base"], [8, 1, 1, "", "fast_small"], [8, 1, 1, "", "fast_tiny"], [8, 1, 1, "", "linknet_resnet18"], [8, 1, 1, "", "linknet_resnet34"], [8, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[8, 1, 1, "", "from_hub"], [8, 1, 1, "", "login_to_hub"], [8, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[8, 1, 1, "", "crnn_mobilenet_v3_large"], [8, 1, 1, "", "crnn_mobilenet_v3_small"], [8, 1, 1, "", "crnn_vgg16_bn"], [8, 1, 1, "", "master"], [8, 1, 1, "", "parseq"], [8, 1, 1, "", "recognition_predictor"], [8, 1, 1, "", "sar_resnet31"], [8, 1, 1, "", "vitstr_base"], [8, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[9, 0, 1, "", "ChannelShuffle"], [9, 0, 1, "", "ColorInversion"], [9, 0, 1, "", "Compose"], [9, 0, 1, "", "GaussianBlur"], [9, 0, 1, "", "GaussianNoise"], [9, 0, 1, "", "LambdaTransformation"], [9, 0, 1, "", "Normalize"], [9, 0, 1, "", "OneOf"], [9, 0, 1, "", "RandomApply"], [9, 0, 1, "", "RandomBrightness"], [9, 0, 1, "", "RandomContrast"], [9, 0, 1, "", "RandomCrop"], [9, 0, 1, "", "RandomGamma"], [9, 0, 1, "", "RandomHorizontalFlip"], [9, 0, 1, "", "RandomHue"], [9, 0, 1, "", "RandomJpegQuality"], [9, 0, 1, "", "RandomResize"], [9, 0, 1, "", "RandomRotate"], [9, 0, 1, "", "RandomSaturation"], [9, 0, 1, "", "RandomShadow"], [9, 0, 1, "", "Resize"], [9, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[10, 0, 1, "", "DetectionMetric"], [10, 0, 1, "", "LocalizationConfusion"], [10, 0, 1, "", "OCRMetric"], [10, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[10, 2, 1, "", "summary"], [10, 2, 1, "", "update"]], "doctr.utils.visualization": [[10, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [1, 7, 8, 10, 14, 17], "0": [1, 3, 6, 9, 10, 12, 15, 16, 18], "00": 18, "01": 18, "0123456789": 6, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "02562": 8, "03": 18, "035": 18, "0361328125": 18, "04": 18, "05": 18, "06": 18, "06640625": 18, "07": 18, "08": [9, 18], "09": 18, "0966796875": 18, "1": [6, 7, 8, 9, 10, 12, 16, 18], "10": [6, 10, 18], "100": [6, 9, 10, 16, 18], "1000": 18, "101": 6, "1024": [8, 12, 18], "104": 6, "106": 6, "108": 6, "1095": 16, "11": 18, "110": 10, "1107": 16, "114": 6, "115": 6, "1156": 16, "116": 6, "118": 6, "11800h": 18, "11th": 18, "12": 18, "120": 6, "123": 6, "126": 6, "1268": 16, "128": [8, 12, 17, 18], "13": 18, "130": 6, "13068": 16, "131": 6, "1337891": 16, "1357421875": 18, "1396484375": 18, "14": 18, "1420": 18, "14470v1": 6, "149": 16, "15": 18, "150": [10, 18], "1552": 18, "16": [8, 17, 18], "1630859375": 18, "1684": 18, "16x16": 8, "17": 18, "1778": 18, "1782": 18, "18": [8, 18], "185546875": 18, "1900": 18, "1910": 8, "19342": 16, "19370": 16, "195": 6, "19598": 16, "199": 18, "1999": 18, "2": [3, 4, 6, 7, 9, 15, 18], "20": 18, "200": 10, "2000": 16, "2003": [4, 6], "2012": 6, "2013": [4, 6], "2015": 6, "2019": 4, "207901": 16, "21": 18, "2103": 6, "2186": 16, "21888": 16, "22": 18, "224": [8, 9], "225": 9, "22672": 16, "229": [9, 16], "23": 18, "233": 16, "236": 6, "24": 18, "246": 16, "249": 16, "25": 18, "2504": 18, "255": [7, 8, 9, 10, 18], "256": 8, "257": 16, "26": 18, "26032": 16, "264": 12, "27": 18, "2700": 16, "2710": 18, "2749": 12, "28": 18, "287": 12, "29": 18, "296": 12, "299": 12, "2d": 18, "3": [3, 4, 7, 8, 9, 10, 17, 18], "30": 18, "300": 16, "3000": 16, "301": 12, "30595": 18, "30ghz": 18, "31": 8, "32": [6, 8, 9, 12, 16, 17, 18], "3232421875": 18, "33": [9, 18], "33402": 16, "33608": 16, "34": [8, 18], "340": 18, "3456": 18, "3515625": 18, "36": 18, "360": 16, "37": [6, 18], "38": 18, "39": 18, "4": [8, 9, 10, 18], "40": 18, "406": 9, "41": 18, "42": 18, "43": 18, "44": 18, "45": 18, "456": 9, "46": 18, "47": 18, "472": 16, "48": [6, 18], "485": 9, "49": 18, "49377": 16, "5": [6, 9, 10, 15, 18], "50": [8, 16, 18], "51": 18, "51171875": 18, "512": 8, "52": [6, 18], "529": 18, "53": 18, "54": 18, "540": 18, "5478515625": 18, "55": 18, "56": 18, "57": 18, "58": [6, 18], "580": 18, "5810546875": 18, "583": 18, "59": 18, "597": 18, "5k": [4, 6], "5m": 18, "6": [9, 18], "60": 9, "600": [8, 10, 18], "61": 18, "62": 18, "626": 16, "63": 18, "64": [8, 9, 18], "641": 18, "647": 16, "65": 18, "66": 18, "67": 18, "68": 18, "69": 18, "693": 12, "694": 12, "695": 12, "6m": 18, "7": 18, "70": [6, 10, 18], "707470": 16, "71": [6, 18], "7100000": 16, "7141797": 16, "7149": 16, "72": 18, "72dpi": 7, "73": 18, "73257": 16, "74": 18, "75": [9, 18], "7581382": 16, "76": 18, "77": 18, "772": 12, "772875": 16, "78": 18, "785": 12, "79": 18, "793533": 16, "796": 16, "798": 12, "7m": 18, "8": [8, 9, 18], "80": 18, "800": [8, 10, 16, 18], "81": 18, "82": 18, "83": 18, "84": 18, "849": 16, "85": 18, "8564453125": 18, "857": 18, "85875": 16, "86": 18, "8603515625": 18, "87": 18, "8707": 16, "88": 18, "89": 18, "9": [3, 9, 18], "90": 18, "90k": 6, "90kdict32px": 6, "91": 18, "914085328578949": 18, "92": 18, "93": 18, "94": [6, 18], "95": [10, 18], "9578408598899841": 18, "96": 18, "97": 18, "98": 18, "99": 18, "9949972033500671": 18, "A": [1, 2, 4, 6, 7, 8, 11, 17], "As": 2, "Be": 18, "Being": 1, "By": 13, "For": [1, 2, 3, 12, 18], "If": [2, 7, 8, 12, 18], "In": [2, 6, 16], "It": [9, 14, 15, 17], "Its": [4, 8], "No": [1, 18], "Of": 6, "Or": [15, 17], "The": [1, 2, 6, 7, 10, 13, 15, 16, 17, 18], "Then": 8, "To": [2, 3, 13, 14, 15, 17, 18], "_": [1, 6, 8], "__call__": 18, "_build": 2, "_i": 10, "ab": 6, "abc": 17, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 6, "abdef": [6, 16], "abl": [16, 18], "about": [1, 16, 18], "abov": 18, "abstractdataset": 6, "abus": 1, "accept": 1, "access": [4, 7, 16, 18], "account": [1, 14], "accur": 18, "accuraci": 10, "achiev": 17, "act": 1, "action": 1, "activ": 4, "ad": [2, 8, 9], "adapt": 1, "add": [9, 10, 14, 18], "add_hook": 18, "add_label": 10, "addit": [2, 3, 7, 15, 18], "addition": [2, 18], "address": [1, 7], "adjust": 9, "advanc": 1, "advantag": 17, "advis": 2, "aesthet": [4, 6], "affect": 1, "after": [14, 18], "ag": 1, "again": 8, "aggreg": [10, 16], "aggress": 1, "align": [1, 7, 9], "all": [1, 2, 5, 6, 7, 9, 10, 15, 16, 18], "allow": [1, 17], "along": 18, "alreadi": [2, 17], "also": [1, 8, 14, 15, 16, 18], "alwai": 16, "an": [1, 2, 4, 6, 7, 8, 10, 15, 17, 18], "analysi": [7, 15], "ancient_greek": 6, "angl": [7, 9], "ani": [1, 6, 7, 8, 9, 10, 17, 18], "annot": 6, "anot": 16, "anoth": [8, 12, 16], "answer": 1, "anyascii": 10, "anyon": 4, "anyth": 15, "api": [2, 4], "apolog": 1, "apologi": 1, "app": 2, "appear": 1, "appli": [1, 6, 9], "applic": [4, 8], "appoint": 1, "appreci": 14, "appropri": [1, 2, 18], "ar": [1, 2, 3, 5, 6, 7, 9, 10, 11, 15, 16, 18], "arab": 6, "arabic_diacrit": 6, "arabic_lett": 6, "arabic_punctu": 6, "arbitrarili": [4, 8], "arch": [8, 14], "architectur": [4, 8, 14, 15], "area": 18, "argument": [6, 7, 8, 10, 12, 18], "around": 1, "arrai": [7, 9, 10], "art": [4, 15], "artefact": [10, 15, 18], "artefact_typ": 7, "artifici": [4, 6], "arxiv": [6, 8], "asarrai": 10, "ascii_lett": 6, "aspect": [4, 8, 9, 18], "assess": 10, "assign": 10, "associ": 7, "assum": 8, "assume_straight_pag": [8, 12, 18], "astyp": [8, 10, 18], "attack": 1, "attend": [4, 8], "attent": [1, 8], "autom": 4, "automat": 18, "autoregress": [4, 8], "avail": [1, 4, 5, 9], "averag": [9, 18], "avoid": [1, 3], "aw": [4, 18], "awar": 18, "azur": 18, "b": [8, 10, 18], "b_j": 10, "back": 2, "backbon": 8, "backend": 18, "background": 16, "bangla": 6, "bar": 15, "bar_cod": 16, "base": [4, 8, 15], "baselin": [4, 8, 18], "batch": [6, 8, 9, 15, 16, 18], "batch_siz": [6, 12, 15, 16, 17], "bblanchon": 3, "bbox": 18, "becaus": 13, "been": [2, 10, 16, 18], "befor": [6, 8, 9, 18], "begin": 10, "behavior": [1, 18], "being": [10, 18], "belong": 18, "benchmark": 18, "best": 1, "better": [11, 18], "between": [9, 10, 18], "bgr": 7, "bilinear": 9, "bin_thresh": 18, "binar": [4, 8, 18], "binari": [7, 17, 18], "bit": 17, "block": [10, 18], "block_1_1": 18, "blur": 9, "bmvc": 6, "bn": 14, "bodi": [1, 18], "bool": [6, 7, 8, 9, 10], "boolean": [8, 18], "both": [4, 6, 9, 16, 18], "bottom": [8, 18], "bound": [6, 7, 8, 9, 10, 15, 16, 18], "box": [6, 7, 8, 9, 10, 15, 16, 18], "box_thresh": 18, "bright": 9, "browser": [2, 4], "build": [2, 3, 17], "built": 2, "byte": [7, 18], "c": [3, 7, 10], "c_j": 10, "cach": [2, 6, 13], "cache_sampl": 6, "call": 17, "callabl": [6, 9], "can": [2, 3, 12, 13, 14, 15, 16, 18], "capabl": [2, 11, 18], "case": [6, 10], "cf": 18, "cfg": 18, "challeng": 6, "challenge2_test_task12_imag": 6, "challenge2_test_task1_gt": 6, "challenge2_training_task12_imag": 6, "challenge2_training_task1_gt": 6, "chang": [13, 18], "channel": [1, 2, 7, 9], "channel_prior": 3, "channelshuffl": 9, "charact": [4, 6, 7, 10, 16, 18], "charactergener": [6, 16], "characterist": 1, "charg": 18, "charset": 18, "chart": 7, "check": [2, 14, 18], "checkpoint": 8, "chip": 3, "ci": 2, "clarifi": 1, "clariti": 1, "class": [1, 6, 7, 9, 10, 18], "class_nam": 12, "classif": [16, 18], "classmethod": 7, "clear": 2, "clone": 3, "close": 2, "co": 14, "code": [4, 7, 15], "codecov": 2, "colab": 11, "collate_fn": 6, "collect": [7, 15], "color": 9, "colorinvers": 9, "column": 7, "com": [1, 3, 7, 8, 14], "combin": 18, "command": [2, 15], "comment": 1, "commit": 1, "common": [1, 9, 10, 17], "commun": 1, "compar": 4, "comparison": [10, 18], "competit": 6, "compil": [11, 18], "complaint": 1, "complementari": 10, "complet": 2, "compon": 18, "compos": [6, 18], "comprehens": 18, "comput": [6, 10, 17, 18], "conf_threshold": 15, "confid": [7, 18], "config": [3, 8], "configur": 8, "confus": 10, "consecut": [9, 18], "consequ": 1, "consid": [1, 2, 6, 7, 10, 18], "consist": 18, "consolid": [4, 6], "constant": 9, "construct": 1, "contact": 1, "contain": [5, 6, 11, 16, 18], "content": [6, 7, 18], "context": 8, "contib": 3, "continu": 1, "contrast": 9, "contrast_factor": 9, "contrib": [3, 15], "contribut": 1, "contributor": 2, "convers": 7, "convert": [7, 9], "convolut": 8, "coordin": [7, 18], "cord": [4, 6, 16, 18], "core": [10, 18], "corner": 18, "correct": 9, "correspond": [3, 7, 9, 18], "could": [1, 15], "counterpart": 10, "cover": 2, "coverag": 2, "cpu": [4, 12, 17], "creat": 14, "crnn": [4, 8, 14], "crnn_mobilenet_v3_larg": [8, 14, 18], "crnn_mobilenet_v3_smal": [8, 17, 18], "crnn_vgg16_bn": [8, 12, 14, 18], "crop": [7, 8, 9, 12, 16, 18], "crop_orient": [7, 18], "crop_orientation_predictor": [8, 12], "crop_param": 12, "cuda": 17, "currenc": 6, "current": [2, 12, 18], "custom": [14, 15, 17, 18], "custom_crop_orientation_model": 12, "custom_page_orientation_model": 12, "customhook": 18, "cvit": 4, "czczup": 8, "czech": 6, "d": [6, 16], "danish": 6, "data": [4, 6, 7, 9, 10, 12, 14], "dataload": 16, "dataset": [8, 12, 18], "dataset_info": 6, "date": [12, 18], "db": 14, "db_mobilenet_v3_larg": [8, 14, 18], "db_resnet34": 18, "db_resnet50": [8, 12, 14, 18], "dbnet": [4, 8], "deal": [11, 18], "decis": 1, "decod": 7, "decode_img_as_tensor": 7, "dedic": 17, "deem": 1, "deep": [8, 18], "def": 18, "default": [3, 7, 12, 13, 18], "defer": 16, "defin": [10, 17], "degre": [7, 9, 18], "degress": 7, "delet": 2, "delimit": 18, "delta": 9, "demo": [2, 4], "demonstr": 1, "depend": [2, 3, 4, 18], "deploi": 2, "deploy": 4, "derogatori": 1, "describ": 8, "descript": 11, "design": 9, "desir": 7, "det_arch": [8, 12, 14, 17], "det_b": 18, "det_model": [12, 14, 17], "det_param": 12, "det_predictor": [12, 18], "detail": [12, 18], "detect": [6, 7, 10, 11, 12, 15], "detect_languag": 8, "detect_orient": [8, 12, 18], "detection_predictor": [8, 18], "detection_task": [6, 16], "detectiondataset": [6, 16], "detectionmetr": 10, "detectionpredictor": [8, 12], "detector": [4, 8, 15], "deterior": 8, "determin": 1, "dev": [2, 13], "develop": 3, "deviat": 9, "devic": 17, "dict": [7, 10, 18], "dictionari": [7, 10], "differ": 1, "differenti": [4, 8], "digit": [4, 6, 16], "dimens": [7, 10, 18], "dimension": 9, "direct": 6, "directli": [14, 18], "directori": [2, 13], "disabl": [1, 13, 18], "disable_crop_orient": 18, "disable_page_orient": 18, "disclaim": 18, "discuss": 2, "disparag": 1, "displai": [7, 10], "display_artefact": 10, "distribut": 9, "div": 18, "divers": 1, "divid": 7, "do": [2, 3, 8], "doc": [2, 7, 15, 17, 18], "docartefact": [6, 16], "docstr": 2, "doctr": [3, 12, 13, 14, 15, 16, 17, 18], "doctr_cache_dir": 13, "doctr_multiprocessing_dis": 13, "document": [6, 8, 10, 11, 12, 15, 16, 17, 18], "documentbuild": 18, "documentfil": [7, 12, 14, 15, 17], "doesn": 17, "don": [12, 18], "done": 9, "download": [6, 16], "downsiz": 8, "draw": 9, "drop": 6, "drop_last": 6, "dtype": [7, 8, 9, 10, 17], "dual": [4, 6], "dummi": 14, "dummy_img": 18, "dummy_input": 17, "dure": 1, "dutch": 6, "dynam": [6, 15], "dynamic_seq_length": 6, "e": [1, 2, 3, 7, 8], "each": [4, 6, 7, 8, 9, 10, 16, 18], "eas": 2, "easi": [4, 10, 14, 17], "easili": [7, 10, 12, 14, 16, 18], "econom": 1, "edit": 1, "educ": 1, "effect": 18, "effici": [2, 4, 6, 8], "either": [10, 18], "element": [6, 7, 8, 18], "els": [2, 15], "email": 1, "empathi": 1, "en": 18, "enabl": [6, 7], "enclos": 7, "encod": [4, 6, 7, 8, 18], "encode_sequ": 6, "encount": 2, "encrypt": 7, "end": [4, 6, 8, 10], "english": [6, 16], "enough": [2, 18], "ensur": 2, "entri": 6, "environ": [1, 13], "eo": 6, "equiv": 18, "estim": 8, "etc": [7, 15], "ethnic": 1, "evalu": [16, 18], "event": 1, "everyon": 1, "everyth": [2, 18], "exact": [10, 18], "exampl": [1, 2, 4, 6, 8, 14, 18], "exchang": 17, "execut": 18, "exist": 14, "expand": 9, "expect": [7, 9, 10], "experi": 1, "explan": [1, 18], "explicit": 1, "exploit": [4, 8], "export": [7, 8, 10, 11, 15, 18], "export_as_straight_box": [8, 18], "export_as_xml": 18, "export_model_to_onnx": 17, "express": [1, 9], "extens": 7, "extern": [1, 16], "extract": [4, 6], "extractor": 8, "f_": 10, "f_a": 10, "factor": 9, "fair": 1, "fairli": 1, "fals": [6, 7, 8, 9, 10, 12, 18], "faq": 1, "fascan": 14, "fast": [4, 6, 8], "fast_bas": [8, 18], "fast_smal": [8, 18], "fast_tini": [8, 18], "faster": [4, 8, 17], "fasterrcnn_mobilenet_v3_large_fpn": 8, "favorit": 18, "featur": [3, 8, 10, 11, 12, 15], "feedback": 1, "feel": [2, 14], "felix92": 14, "few": [17, 18], "figsiz": 10, "figur": [10, 15], "file": [2, 6], "final": 8, "find": [2, 16], "finnish": 6, "first": [2, 6], "firsthand": 6, "fit": [8, 18], "flag": 18, "flip": 9, "float": [7, 9, 10, 17], "float32": [7, 8, 9, 17], "fn": 9, "focu": 14, "focus": [1, 6], "folder": 6, "follow": [1, 2, 3, 6, 9, 10, 12, 13, 14, 15, 18], "font": 6, "font_famili": 6, "foral": 10, "forc": 2, "forg": 3, "form": [4, 6, 18], "format": [7, 10, 12, 16, 17, 18], "forpost": [4, 6], "forum": 2, "fp16": 17, "frac": 10, "framework": [3, 14, 16, 18], "free": [1, 2, 14], "french": [6, 12, 14, 18], "friendli": 4, "from": [1, 4, 6, 7, 8, 9, 10, 11, 12, 15, 16, 17, 18], "from_hub": [8, 14], "from_imag": [7, 14, 15, 17], "from_pdf": 7, "from_url": 7, "full": [6, 10, 18], "function": [6, 9, 10, 15], "funsd": [4, 6, 16, 18], "further": 16, "futur": 6, "g": [7, 8], "g_": 10, "g_x": 10, "gamma": 9, "gaussian": 9, "gaussianblur": 9, "gaussiannois": 9, "gen": 18, "gender": 1, "gener": [2, 4, 7, 8], "generic_cyrillic_lett": 6, "geometri": [4, 7, 18], "geq": 10, "german": [6, 12, 14], "get": [17, 18], "git": 14, "github": [2, 3, 8, 14], "give": [1, 15], "given": [6, 7, 9, 10, 18], "global": 8, "go": 18, "good": 17, "googl": 2, "googlevis": 4, "gpu": [4, 15, 17], "gracefulli": 1, "graph": [4, 6, 7], "grayscal": 9, "ground": 10, "groung": 10, "group": [4, 18], "gt": 10, "gt_box": 10, "gt_label": 10, "guid": 2, "guidanc": 16, "gvision": 18, "h": [7, 8, 9], "h_": 10, "ha": [2, 6, 10, 16], "handl": [11, 16, 18], "handwrit": 6, "handwritten": 16, "harass": 1, "hardwar": 18, "harm": 1, "hat": 10, "have": [1, 2, 10, 12, 14, 16, 17, 18], "head": [8, 18], "healthi": 1, "hebrew": 6, "height": [7, 9], "hello": [10, 18], "help": 17, "here": [5, 9, 11, 15, 16, 18], "hf": 8, "hf_hub_download": 8, "high": 7, "higher": [3, 6, 18], "hindi": 6, "hindi_digit": 6, "hocr": 18, "hook": 18, "horizont": [7, 9, 18], "hous": 6, "how": [2, 11, 12, 14, 16], "howev": 16, "hsv": 9, "html": [1, 2, 3, 7, 18], "http": [1, 3, 6, 7, 8, 14, 18], "hub": 8, "hue": 9, "huggingfac": 8, "hw": 6, "i": [1, 2, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17], "i7": 18, "ic03": [4, 6, 16], "ic13": [4, 6, 16], "icdar": [4, 6], "icdar2019": 6, "id": 18, "ident": 1, "identifi": 4, "iiit": [4, 6], "iiit5k": [6, 16], "iiithw": [4, 6, 16], "imag": [4, 6, 7, 8, 9, 10, 14, 15, 16, 18], "imagenet": 8, "imageri": 1, "images_90k_norm": 6, "img": [6, 9, 16, 17], "img_cont": 7, "img_fold": [6, 16], "img_path": 7, "img_transform": 6, "imgur5k": [4, 6, 16], "imgur5k_annot": 6, "imlist": 6, "impact": 1, "implement": [6, 7, 8, 9, 10, 18], "import": [6, 7, 8, 9, 10, 12, 14, 15, 16, 17, 18], "improv": 8, "inappropri": 1, "incid": 1, "includ": [1, 6, 16, 17], "inclus": 1, "increas": 9, "independ": 9, "index": [2, 7], "indic": 10, "individu": 1, "infer": [4, 8, 9, 15, 18], "inform": [1, 2, 4, 6, 16], "input": [2, 7, 8, 9, 17, 18], "input_crop": 8, "input_pag": [8, 10, 18], "input_shap": 17, "input_tensor": 8, "inspir": [1, 9], "instal": [14, 15, 17], "instanc": [1, 18], "instanti": [8, 18], "instead": [6, 7, 8], "insult": 1, "int": [6, 7, 9], "int64": 10, "integ": 10, "integr": [4, 14, 16], "intel": 18, "interact": [1, 7, 10], "interfac": [14, 17], "interoper": 17, "interpol": 9, "interpret": [6, 7], "intersect": 10, "invert": 9, "investig": 1, "invis": 1, "involv": [1, 18], "io": [12, 14, 15, 17], "iou": 10, "iou_thresh": 10, "iou_threshold": 15, "irregular": [4, 8, 16], "isn": 6, "issu": [1, 2, 14], "italian": 6, "iter": [6, 9, 16, 18], "its": [7, 8, 9, 10, 16, 18], "itself": [8, 14], "j": 10, "job": 2, "join": 2, "jpeg": 9, "jpegqual": 9, "jpg": [6, 7, 14, 17], "json": [6, 16, 18], "json_output": 18, "jump": 2, "just": 1, "kei": [4, 6], "kera": [8, 17], "kernel": [4, 8, 9], "kernel_shap": 9, "keywoard": 8, "keyword": [6, 7, 8, 10], "kie": [8, 12], "kie_predictor": [8, 12], "kiepredictor": 8, "kind": 1, "know": [2, 17], "kwarg": [6, 7, 8, 10], "l": 10, "l_j": 10, "label": [6, 10, 15, 16], "label_fil": [6, 16], "label_fold": 6, "label_path": [6, 16], "labels_path": [6, 16], "ladder": 1, "lambda": 9, "lambdatransform": 9, "lang": 18, "languag": [1, 4, 6, 7, 8, 14, 18], "larg": [8, 14], "largest": 10, "last": [3, 6], "latenc": 8, "later": 2, "latest": 18, "latin": 6, "layer": 17, "layout": 18, "lead": 1, "leader": 1, "learn": [1, 4, 8, 17, 18], "least": 3, "left": [10, 18], "legacy_french": 6, "length": [6, 18], "less": [17, 18], "level": [1, 6, 10, 18], "leverag": 11, "lf": 14, "librari": [2, 3, 11, 12], "light": 4, "lightweight": 17, "like": 1, "limits_": 10, "line": [4, 8, 10, 18], "line_1_1": 18, "link": 12, "linknet": [4, 8], "linknet_resnet18": [8, 12, 17, 18], "linknet_resnet34": [8, 17, 18], "linknet_resnet50": [8, 18], "list": [6, 7, 9, 10, 14], "ll": 10, "load": [4, 6, 8, 15, 17], "load_state_dict": 12, "load_weight": 12, "loc_pr": 18, "local": [2, 4, 6, 8, 10, 16, 18], "localis": 6, "localizationconfus": 10, "locat": [2, 7, 18], "login": 8, "login_to_hub": [8, 14], "logo": [7, 15, 16], "love": 14, "lower": [9, 10, 18], "m": [2, 10, 18], "m1": 3, "macbook": 3, "machin": 17, "made": 4, "magc_resnet31": 8, "mai": [1, 2], "mail": 1, "main": 11, "maintain": 4, "mainten": 2, "make": [1, 2, 10, 12, 13, 14, 17, 18], "mani": [16, 18], "manipul": 18, "map": [6, 8], "map_loc": 12, "master": [4, 8, 18], "match": [10, 18], "mathcal": 10, "matplotlib": [7, 10], "max": [6, 9, 10], "max_angl": 9, "max_area": 9, "max_char": [6, 16], "max_delta": 9, "max_gain": 9, "max_gamma": 9, "max_qual": 9, "max_ratio": 9, "maximum": [6, 9], "maxval": [8, 9], "mbox": 10, "mean": [9, 10, 12], "meaniou": 10, "meant": [7, 17], "measur": 18, "media": 1, "median": 8, "meet": 12, "member": 1, "memori": [13, 17], "mention": 18, "merg": 6, "messag": 2, "meta": 18, "metadata": 17, "metal": 3, "method": [7, 9, 18], "metric": [10, 18], "middl": 18, "might": [17, 18], "min": 9, "min_area": 9, "min_char": [6, 16], "min_gain": 9, "min_gamma": 9, "min_qual": 9, "min_ratio": 9, "min_val": 9, "minde": [1, 3, 4, 8], "minim": [2, 4], "minimalist": [4, 8], "minimum": [3, 6, 9, 10, 18], "minval": 9, "miss": 3, "mistak": 1, "mixed_float16": 17, "mixed_precis": 17, "mjsynth": [4, 6, 16], "mnt": 6, "mobilenet": [8, 14], "mobilenet_v3_larg": 8, "mobilenet_v3_large_r": 8, "mobilenet_v3_smal": [8, 12], "mobilenet_v3_small_crop_orient": [8, 12], "mobilenet_v3_small_page_orient": [8, 12], "mobilenet_v3_small_r": 8, "mobilenetv3": 8, "modal": [4, 6], "mode": 3, "model": [6, 10, 13, 15, 16], "model_nam": [8, 14, 17], "model_path": [15, 17], "moder": 1, "modif": 2, "modifi": [8, 13, 18], "modul": [3, 7, 8, 9, 10, 18], "more": [2, 16, 18], "most": 18, "mozilla": 1, "multi": [4, 8], "multilingu": [6, 14], "multipl": [6, 7, 9, 18], "multipli": 9, "multiprocess": 13, "my": 8, "my_awesome_model": 14, "my_hook": 18, "n": [6, 10], "name": [6, 8, 17, 18], "nation": 1, "natur": [1, 4, 6], "ndarrai": [6, 7, 9, 10], "necessari": [3, 12, 13], "need": [2, 3, 6, 10, 12, 13, 14, 15, 18], "neg": 9, "nest": 18, "network": [4, 6, 8, 17], "neural": [4, 6, 8, 17], "new": [2, 10], "next": [6, 16], "nois": 9, "noisi": [4, 6], "non": [4, 6, 7, 8, 9, 10], "none": [6, 7, 8, 9, 10, 18], "normal": [8, 9], "norwegian": 6, "note": [0, 2, 6, 8, 12, 14, 15, 17], "now": 2, "np": [8, 9, 10, 18], "num_output_channel": 9, "num_sampl": [6, 16], "number": [6, 9, 10, 18], "numpi": [7, 8, 10, 18], "o": 3, "obb": 15, "obj_detect": 14, "object": [6, 7, 10, 15, 18], "objectness_scor": [7, 18], "oblig": 1, "obtain": 18, "occupi": 17, "ocr": [4, 6, 8, 10, 14], "ocr_carea": 18, "ocr_db_crnn": 10, "ocr_lin": 18, "ocr_pag": 18, "ocr_par": 18, "ocr_predictor": [8, 12, 14, 17, 18], "ocrdataset": [6, 16], "ocrmetr": 10, "ocrpredictor": [8, 12], "ocrx_word": 18, "offens": 1, "offici": [1, 8], "offlin": 1, "offset": 9, "onc": 18, "one": [2, 6, 8, 9, 12, 14, 18], "oneof": 9, "ones": [6, 10], "onli": [2, 8, 9, 10, 12, 14, 16, 17, 18], "onlin": 1, "onnx": 15, "onnxruntim": [15, 17], "onnxtr": 17, "opac": 9, "opacity_rang": 9, "open": [1, 2, 14, 17], "opinion": 1, "optic": [4, 18], "optim": [4, 18], "option": [6, 8, 12], "order": [2, 6, 7, 9], "org": [1, 6, 8, 18], "organ": 7, "orient": [1, 7, 8, 11, 15, 18], "orientationpredictor": 8, "other": [1, 2], "otherwis": [1, 7, 10], "our": [2, 8, 18], "out": [2, 8, 9, 10, 18], "outpout": 18, "output": [7, 9, 17], "output_s": [7, 9], "outsid": 13, "over": [6, 10, 18], "overal": [1, 8], "overlai": 7, "overview": 15, "overwrit": 12, "overwritten": 14, "own": 4, "p": [9, 18], "packag": [2, 4, 10, 13, 15, 16, 17], "pad": [6, 8, 9, 18], "page": [3, 6, 8, 10, 12, 18], "page1": 7, "page2": 7, "page_1": 18, "page_idx": [7, 18], "page_orientation_predictor": [8, 12], "page_param": 12, "pair": 10, "paper": 8, "par_1_1": 18, "paragraph": 18, "paragraph_break": 18, "param": [9, 18], "paramet": [4, 7, 8, 17], "pars": [4, 6], "parseq": [4, 8, 14, 17, 18], "part": [6, 9, 18], "parti": 3, "partial": 18, "particip": 1, "pass": [6, 7, 8, 12, 18], "password": 7, "patch": [8, 10], "path": [6, 7, 15, 16, 17], "path_to_checkpoint": 12, "path_to_custom_model": 17, "path_to_pt": 12, "pattern": 1, "pdf": [7, 8, 11], "pdfpage": 7, "peopl": 1, "per": [9, 18], "perform": [4, 7, 8, 9, 10, 13, 17, 18], "period": 1, "permiss": 1, "permut": [4, 8], "persian_lett": 6, "person": [1, 16], "phase": 18, "photo": 16, "physic": [1, 7], "pick": 9, "pictur": 7, "pip": [2, 3, 15, 17], "pipelin": 18, "pixel": [7, 9, 18], "pleas": 2, "plot": 10, "plt": 10, "plug": 14, "plugin": 3, "png": 7, "point": 17, "polici": 13, "polish": 6, "polit": 1, "polygon": [6, 10, 18], "pool": 8, "portugues": 6, "posit": [1, 10], "possibl": [2, 10, 14, 18], "post": [1, 18], "postprocessor": 18, "potenti": 8, "power": 4, "ppageno": 18, "pre": [2, 8, 17], "precis": [10, 18], "pred": 10, "pred_box": 10, "pred_label": 10, "predefin": 16, "predict": [7, 8, 10, 18], "predictor": [4, 7, 8, 11, 12, 14, 17], "prefer": 16, "preinstal": 3, "preprocessor": [12, 18], "prerequisit": 14, "present": 11, "preserv": [8, 9, 18], "preserve_aspect_ratio": [7, 8, 9, 12, 18], "pretrain": [4, 8, 10, 12, 17, 18], "pretrained_backbon": [8, 12], "print": 18, "prior": 6, "privaci": 1, "privat": 1, "probabl": 9, "problem": 2, "procedur": 9, "process": [2, 4, 7, 12, 18], "processor": 18, "produc": [11, 18], "product": 17, "profession": 1, "project": [2, 16], "promptli": 1, "proper": 2, "properli": 6, "provid": [1, 2, 4, 14, 15, 16, 18], "public": [1, 4], "publicli": 18, "publish": 1, "pull": 14, "punctuat": 6, "pure": 6, "purpos": 2, "push_to_hf_hub": [8, 14], "py": 14, "pypdfium2": [3, 7], "pyplot": [7, 10], "python": [2, 15], "python3": 14, "pytorch": [3, 4, 8, 9, 12, 14, 17, 18], "q": 2, "qr": [7, 15], "qr_code": 16, "qualiti": 9, "question": 1, "quickli": 4, "quicktour": 11, "r": 18, "race": 1, "ramdisk": 6, "rand": [8, 9, 10, 17, 18], "random": [8, 9, 10, 18], "randomappli": 9, "randombright": 9, "randomcontrast": 9, "randomcrop": 9, "randomgamma": 9, "randomhorizontalflip": 9, "randomhu": 9, "randomjpegqu": 9, "randomli": 9, "randomres": 9, "randomrot": 9, "randomsatur": 9, "randomshadow": 9, "rang": 9, "rassi": 14, "ratio": [8, 9, 18], "raw": [7, 10], "re": 17, "read": [4, 6, 8], "read_html": 7, "read_img_as_numpi": 7, "read_img_as_tensor": 7, "read_pdf": 7, "readi": 17, "real": [4, 8, 9], "reason": [1, 4, 6], "rebuild": 2, "rebuilt": 2, "recal": [10, 18], "receipt": [4, 6, 18], "reco_arch": [8, 12, 14, 17], "reco_b": 18, "reco_model": [12, 14, 17], "reco_param": 12, "reco_predictor": 12, "recogn": 18, "recognit": [6, 10, 11, 12], "recognition_predictor": [8, 18], "recognition_task": [6, 16], "recognitiondataset": [6, 16], "recognitionpredictor": [8, 12], "rectangular": 8, "reduc": [3, 9], "refer": [2, 3, 12, 14, 15, 16, 18], "regardless": 1, "region": 18, "regroup": 10, "regular": 16, "reject": 1, "rel": [7, 9, 10, 18], "relat": 7, "releas": [0, 3], "relev": 15, "religion": 1, "remov": 1, "render": [7, 18], "repo": 8, "repo_id": [8, 14], "report": 1, "repositori": [6, 8, 14], "repres": [1, 17, 18], "represent": [4, 8], "request": [1, 14], "requir": [3, 9, 17], "research": 4, "residu": 8, "resiz": [9, 18], "resnet": 8, "resnet18": [8, 14], "resnet31": 8, "resnet34": 8, "resnet50": [8, 14], "resolv": 7, "resolve_block": 18, "resolve_lin": 18, "resourc": 16, "respect": 1, "rest": [2, 9, 10], "restrict": 13, "result": [2, 6, 7, 11, 14, 17, 18], "return": 18, "reusabl": 18, "review": 1, "rgb": [7, 9], "rgb_mode": 7, "rgb_output": 7, "right": [1, 8, 10], "robust": [4, 6], "root": 6, "rotat": [6, 7, 8, 9, 10, 11, 12, 16, 18], "run": [2, 3, 8], "same": [2, 7, 10, 16, 17, 18], "sampl": [6, 16, 18], "sample_transform": 6, "sar": [4, 8], "sar_resnet31": [8, 18], "satur": 9, "save": [8, 16], "scale": [7, 8, 9, 10], "scale_rang": 9, "scan": [4, 6], "scene": [4, 6, 8], "score": [7, 10], "script": [2, 16], "seamless": 4, "seamlessli": [4, 18], "search": 8, "searchabl": 11, "sec": 18, "second": 18, "section": [12, 14, 15, 17, 18], "secur": [1, 13], "see": [1, 2], "seen": 18, "segment": [4, 8, 18], "self": 18, "semant": [4, 8], "send": 18, "sens": 10, "sensit": 16, "separ": 18, "sequenc": [4, 6, 7, 8, 10, 18], "sequenti": [9, 18], "seri": 1, "seriou": 1, "set": [1, 3, 6, 8, 10, 13, 15, 18], "set_global_polici": 17, "sever": [7, 9, 18], "sex": 1, "sexual": 1, "shade": 9, "shape": [4, 7, 8, 9, 10, 18], "share": [13, 16], "shift": 9, "shm": 13, "should": [2, 6, 7, 9, 10], "show": [4, 7, 8, 10, 12, 14, 15], "showcas": [2, 11], "shuffl": [6, 9], "side": 10, "signatur": 7, "signific": 16, "simpl": [4, 8, 17], "simpler": 8, "sinc": [6, 16], "singl": [1, 2, 4, 6], "single_img_doc": 17, "size": [1, 6, 7, 9, 15, 18], "skew": 18, "slack": 2, "slightli": 8, "small": [2, 8, 18], "smallest": 7, "snapshot_download": 8, "snippet": 18, "so": [2, 3, 6, 8, 14, 16], "social": 1, "socio": 1, "some": [3, 11, 14, 16], "someth": 2, "somewher": 2, "sort": 1, "sourc": [6, 7, 8, 9, 10, 14], "space": [1, 18], "span": 18, "spanish": 6, "spatial": [4, 6, 7], "specif": [2, 3, 10, 12, 16, 18], "specifi": [1, 6, 7], "speed": [4, 8, 18], "sphinx": 2, "sroie": [4, 6, 16], "stabl": 3, "stackoverflow": 2, "stage": 4, "standalon": 11, "standard": 9, "start": 6, "state": [4, 10, 15], "static": 10, "statu": 1, "std": [9, 12], "step": 13, "still": 18, "str": [6, 7, 8, 9, 10], "straight": [6, 8, 16, 18], "straighten": 18, "straighten_pag": [8, 12, 18], "straigten_pag": 12, "stream": 7, "street": [4, 6], "strict": 3, "strictli": 10, "string": [6, 7, 10, 18], "strive": 3, "strong": [4, 8], "structur": [17, 18], "subset": [6, 18], "suggest": [2, 14], "sum": 10, "summari": 10, "support": [3, 12, 15, 17, 18], "sustain": 1, "svhn": [4, 6, 16], "svt": [6, 16], "swedish": 6, "symmetr": [8, 9, 18], "symmetric_pad": [8, 9, 18], "synthet": 4, "synthtext": [4, 6, 16], "system": 18, "t": [2, 6, 12, 17, 18], "tabl": [14, 15, 16], "take": [1, 6, 18], "target": [6, 7, 9, 10, 16], "target_s": 6, "task": [4, 6, 8, 14, 16, 18], "task2": 6, "team": 3, "techminde": 3, "templat": [2, 4], "tensor": [6, 7, 9, 18], "tensorflow": [3, 4, 7, 8, 9, 12, 14, 17, 18], "tensorspec": 17, "term": 1, "test": [6, 16], "test_set": 6, "text": [6, 7, 8, 10, 16], "text_output": 18, "textmatch": 10, "textnet": 8, "textnet_bas": 8, "textnet_smal": 8, "textnet_tini": 8, "textract": [4, 18], "textstylebrush": [4, 6], "textual": [4, 6, 7, 8, 18], "tf": [3, 7, 8, 9, 14, 17], "than": [2, 10, 14], "thank": 2, "thei": [1, 10], "them": [6, 18], "thi": [1, 2, 3, 5, 6, 9, 10, 12, 13, 14, 16, 17, 18], "thing": [17, 18], "third": 3, "those": [1, 7, 18], "threaten": 1, "threshold": 18, "through": [1, 9, 15, 16], "tilman": 14, "time": [1, 4, 8, 10, 16], "tini": 8, "titl": [7, 18], "tm": 18, "tmp": 13, "togeth": [2, 7], "tograi": 9, "tool": 16, "top": [10, 17, 18], "topic": 2, "torch": [3, 9, 12, 14, 17], "torchvis": 9, "total": 12, "toward": [1, 3], "train": [2, 6, 8, 9, 14, 15, 16, 17, 18], "train_it": [6, 16], "train_load": [6, 16], "train_pytorch": 14, "train_set": [6, 16], "train_tensorflow": 14, "trainabl": [4, 8], "tranform": 9, "transcrib": 18, "transfer": [4, 6], "transfo": 9, "transform": [4, 6, 8], "translat": 1, "troll": 1, "true": [6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18], "truth": 10, "tune": 17, "tupl": [6, 7, 9, 10], "two": [7, 13], "txt": 6, "type": [7, 10, 14, 17, 18], "typic": 18, "u": [1, 2], "ucsd": 6, "udac": 2, "uint8": [7, 8, 10, 18], "ukrainian": 6, "unaccept": 1, "underli": [16, 18], "underneath": 7, "understand": [4, 6, 18], "uniform": [8, 9], "uniformli": 9, "uninterrupt": [7, 18], "union": 10, "unittest": 2, "unlock": 7, "unoffici": 8, "unprofession": 1, "unsolicit": 1, "unsupervis": 4, "unwelcom": 1, "up": [8, 18], "updat": 10, "upgrad": 2, "upper": [6, 9], "uppercas": 16, "url": 7, "us": [1, 2, 3, 6, 8, 10, 11, 12, 13, 14, 15, 18], "usabl": 18, "usag": [13, 17], "use_polygon": [6, 10, 16], "useabl": 18, "user": [4, 7, 11], "utf": 18, "util": 17, "v1": 14, "v3": [8, 14, 18], "valid": 16, "valu": [2, 7, 9, 18], "valuabl": 4, "variabl": 13, "varieti": 6, "veri": 8, "version": [1, 2, 3, 17, 18], "vgg": 8, "vgg16": 14, "vgg16_bn_r": 8, "via": 1, "vietnames": 6, "view": [4, 6], "viewpoint": 1, "violat": 1, "visibl": 1, "vision": [4, 6, 8], "visiondataset": 6, "visiontransform": 8, "visual": [3, 4, 15], "visualize_pag": 10, "vit_": 8, "vit_b": 8, "vitstr": [4, 8, 17], "vitstr_bas": [8, 18], "vitstr_smal": [8, 12, 17, 18], "viz": 3, "vocab": [12, 14, 16, 17, 18], "vocabulari": [6, 12, 14], "w": [7, 8, 9, 10], "w3": 18, "wa": 1, "wai": [1, 4, 16], "want": [2, 17, 18], "warmup": 18, "wasn": 2, "we": [1, 2, 3, 4, 7, 9, 12, 14, 16, 17, 18], "weasyprint": 7, "web": [2, 7], "websit": 6, "welcom": 1, "well": [1, 17], "were": [1, 7, 18], "what": 1, "when": [1, 2, 8], "whenev": 2, "where": [2, 7, 9, 10], "whether": [2, 6, 7, 9, 10, 16, 18], "which": [1, 8, 13, 15, 16, 18], "whichev": 3, "while": [9, 18], "why": 1, "width": [7, 9], "wiki": 1, "wildreceipt": [4, 6, 16], "window": [8, 10], "wish": 2, "within": 1, "without": [1, 6, 8], "wonder": 2, "word": [4, 6, 8, 10, 18], "word_1_1": 18, "word_1_2": 18, "word_1_3": 18, "wordgener": [6, 16], "words_onli": 10, "work": [12, 13, 18], "workflow": 2, "worklow": 2, "world": [10, 18], "worth": 8, "wrap": 18, "wrapper": [6, 9], "write": 13, "written": [1, 7], "www": [1, 7, 18], "x": [7, 9, 10], "x_ascend": 18, "x_descend": 18, "x_i": 10, "x_size": 18, "x_wconf": 18, "xhtml": 18, "xmax": 7, "xmin": 7, "xml": 18, "xml_bytes_str": 18, "xml_element": 18, "xml_output": 18, "xmln": 18, "y": 10, "y_i": 10, "y_j": 10, "yet": 15, "ymax": 7, "ymin": 7, "yolov8": 15, "you": [2, 3, 6, 7, 8, 12, 13, 14, 15, 16, 17, 18], "your": [2, 4, 7, 10, 18], "yoursit": 7, "zero": [9, 10], "zoo": 12, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 6, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 6, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 6, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 6, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 6, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 6, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 6, "\u00e4\u00f6\u00e4\u00f6": 6, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 6, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 6, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 6, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 6, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 6, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 6, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 6, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 6, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 6, "\u067e\u0686\u06a2\u06a4\u06af": 6, "\u0905": 6, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 6, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 6, "\u0950": 6, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 6, "\u09bd": 6, "\u09ce": 6, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 6}, "titles": ["Changelog", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 2, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 1], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 1], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 1], "31": 0, "4": [0, 1], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 18, "approach": 18, "architectur": 18, "arg": [6, 7, 8, 9, 10], "artefact": 7, "artefactdetect": 15, "attribut": 1, "avail": [15, 16, 18], "aw": 13, "ban": 1, "block": 7, "bug": 2, "changelog": 0, "choos": [16, 18], "classif": [8, 12, 14], "code": [1, 2], "codebas": 2, "commit": 2, "commun": 14, "compos": 9, "conda": 3, "conduct": 1, "connect": 2, "continu": 2, "contrib": 5, "contribut": [2, 5, 15], "contributor": 1, "convent": 14, "correct": 1, "coven": 1, "custom": [6, 12], "data": 16, "dataload": 6, "dataset": [4, 6, 16], "detect": [4, 8, 14, 16, 18], "develop": 2, "do": 18, "doctr": [2, 4, 5, 6, 7, 8, 9, 10, 11], "document": [2, 4, 7], "end": 18, "enforc": 1, "evalu": 10, "export": 17, "factori": 8, "featur": [2, 4], "feedback": 2, "file": 7, "from": 14, "gener": [6, 16], "git": 3, "guidelin": 1, "half": 17, "hub": 14, "huggingfac": 14, "i": 18, "infer": 17, "instal": [2, 3], "integr": [2, 15], "io": 7, "lambda": 13, "let": 2, "line": 7, "linux": 3, "load": [12, 14, 16], "loader": 6, "main": 4, "mode": 2, "model": [4, 8, 12, 14, 17, 18], "modifi": 2, "modul": [5, 15], "name": 14, "notebook": 11, "object": 16, "ocr": [16, 18], "onli": 3, "onnx": 17, "optim": 17, "option": 18, "orient": 12, "our": 1, "output": 18, "own": [12, 16], "packag": 3, "page": 7, "perman": 1, "pipelin": 15, "pledg": 1, "precis": 17, "predictor": 18, "prepar": 17, "prerequisit": 3, "pretrain": 14, "push": 14, "python": 3, "qualiti": 2, "question": 2, "read": 7, "readi": 16, "recognit": [4, 8, 14, 16, 18], "report": 2, "request": 2, "respons": 1, "return": [6, 7, 8, 10], "right": 18, "scope": 1, "share": 14, "should": 18, "stage": 18, "standard": 1, "structur": [2, 7], "style": 2, "support": [4, 5, 6, 9], "synthet": [6, 16], "task": 10, "temporari": 1, "test": 2, "text": [4, 18], "train": 12, "transform": 9, "two": 18, "unit": 2, "us": [16, 17], "util": 10, "v0": 0, "verif": 2, "via": 3, "visual": 10, "vocab": 6, "warn": 1, "what": 18, "word": 7, "your": [12, 14, 15, 16, 17], "zoo": [4, 8]}})
\ No newline at end of file
diff --git a/using_doctr/custom_models_training.html b/using_doctr/custom_models_training.html
index 580b4368b7..e664c6a950 100644
--- a/using_doctr/custom_models_training.html
+++ b/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -615,7 +615,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/using_doctr/running_on_aws.html b/using_doctr/running_on_aws.html
index ddb0c3c80f..81c38b49f5 100644
--- a/using_doctr/running_on_aws.html
+++ b/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -358,7 +358,7 @@ AWS Lambda
-
+
diff --git a/using_doctr/sharing_models.html b/using_doctr/sharing_models.html
index 07a3b2f2a3..4f5d1d68a5 100644
--- a/using_doctr/sharing_models.html
+++ b/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -540,7 +540,7 @@ Recognition
-
+
diff --git a/using_doctr/using_contrib_modules.html b/using_doctr/using_contrib_modules.html
index b4a10925e6..cf282ff3a4 100644
--- a/using_doctr/using_contrib_modules.html
+++ b/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -411,7 +411,7 @@ ArtefactDetection
-
+
diff --git a/using_doctr/using_datasets.html b/using_doctr/using_datasets.html
index 4a52df36ba..e30b6d6459 100644
--- a/using_doctr/using_datasets.html
+++ b/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -638,7 +638,7 @@ Data Loading
-
+
diff --git a/using_doctr/using_model_export.html b/using_doctr/using_model_export.html
index 2b30ee63a1..ad9d09ed4c 100644
--- a/using_doctr/using_model_export.html
+++ b/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -463,7 +463,7 @@ Using your ONNX exported model
-
+
diff --git a/using_doctr/using_models.html b/using_doctr/using_models.html
index 13cb06116b..5c80dbf62d 100644
--- a/using_doctr/using_models.html
+++ b/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1249,7 +1249,7 @@ Advanced options
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/cord.html b/v0.1.0/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.0/_modules/doctr/datasets/cord.html
+++ b/v0.1.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/detection.html b/v0.1.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.0/_modules/doctr/datasets/detection.html
+++ b/v0.1.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/doc_artefacts.html b/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/funsd.html b/v0.1.0/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.0/_modules/doctr/datasets/funsd.html
+++ b/v0.1.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ic03.html b/v0.1.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.0/_modules/doctr/datasets/ic03.html
+++ b/v0.1.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ic13.html b/v0.1.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.0/_modules/doctr/datasets/ic13.html
+++ b/v0.1.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/iiit5k.html b/v0.1.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/iiithws.html b/v0.1.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/imgur5k.html b/v0.1.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/loader.html b/v0.1.0/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.0/_modules/doctr/datasets/loader.html
+++ b/v0.1.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/mjsynth.html b/v0.1.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ocr.html b/v0.1.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.0/_modules/doctr/datasets/ocr.html
+++ b/v0.1.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/recognition.html b/v0.1.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.0/_modules/doctr/datasets/recognition.html
+++ b/v0.1.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/sroie.html b/v0.1.0/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.0/_modules/doctr/datasets/sroie.html
+++ b/v0.1.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/svhn.html b/v0.1.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.0/_modules/doctr/datasets/svhn.html
+++ b/v0.1.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/svt.html b/v0.1.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.0/_modules/doctr/datasets/svt.html
+++ b/v0.1.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/synthtext.html b/v0.1.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/utils.html b/v0.1.0/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.0/_modules/doctr/datasets/utils.html
+++ b/v0.1.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/wildreceipt.html b/v0.1.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.0/_modules/doctr/io/elements.html b/v0.1.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.0/_modules/doctr/io/elements.html
+++ b/v0.1.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.0/_modules/doctr/io/html.html b/v0.1.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.0/_modules/doctr/io/html.html
+++ b/v0.1.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.0/_modules/doctr/io/image/base.html b/v0.1.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.0/_modules/doctr/io/image/base.html
+++ b/v0.1.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.0/_modules/doctr/io/image/tensorflow.html b/v0.1.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/io/pdf.html b/v0.1.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.0/_modules/doctr/io/pdf.html
+++ b/v0.1.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.0/_modules/doctr/io/reader.html b/v0.1.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.0/_modules/doctr/io/reader.html
+++ b/v0.1.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/zoo.html b/v0.1.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/zoo.html b/v0.1.0/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/models/factory/hub.html b/v0.1.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.0/_modules/doctr/models/factory/hub.html
+++ b/v0.1.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/zoo.html b/v0.1.0/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/models/zoo.html b/v0.1.0/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.0/_modules/doctr/models/zoo.html
+++ b/v0.1.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/transforms/modules/base.html b/v0.1.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/utils/metrics.html b/v0.1.0/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.0/_modules/doctr/utils/metrics.html
+++ b/v0.1.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.0/_modules/doctr/utils/visualization.html b/v0.1.0/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.0/_modules/doctr/utils/visualization.html
+++ b/v0.1.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.0/_modules/index.html b/v0.1.0/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.0/_modules/index.html
+++ b/v0.1.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.0/_sources/getting_started/installing.rst.txt b/v0.1.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.0/_sources/getting_started/installing.rst.txt
+++ b/v0.1.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.0/_static/basic.css b/v0.1.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.0/_static/basic.css
+++ b/v0.1.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.0/_static/doctools.js b/v0.1.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.0/_static/doctools.js
+++ b/v0.1.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.0/_static/language_data.js b/v0.1.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.0/_static/language_data.js
+++ b/v0.1.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.0/_static/searchtools.js b/v0.1.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.0/_static/searchtools.js
+++ b/v0.1.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.0/changelog.html b/v0.1.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.0/changelog.html
+++ b/v0.1.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.0/community/resources.html b/v0.1.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.0/community/resources.html
+++ b/v0.1.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.0/contributing/code_of_conduct.html b/v0.1.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.0/contributing/code_of_conduct.html
+++ b/v0.1.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.0/contributing/contributing.html b/v0.1.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.0/contributing/contributing.html
+++ b/v0.1.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.0/genindex.html b/v0.1.0/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.0/genindex.html
+++ b/v0.1.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.0/getting_started/installing.html b/v0.1.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.0/getting_started/installing.html
+++ b/v0.1.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.0/index.html b/v0.1.0/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.0/index.html
+++ b/v0.1.0/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.0/modules/contrib.html b/v0.1.0/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.0/modules/contrib.html
+++ b/v0.1.0/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.0/modules/datasets.html b/v0.1.0/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.0/modules/datasets.html
+++ b/v0.1.0/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.0/modules/io.html b/v0.1.0/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.0/modules/io.html
+++ b/v0.1.0/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.0/modules/models.html b/v0.1.0/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.0/modules/models.html
+++ b/v0.1.0/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.0/modules/transforms.html b/v0.1.0/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.0/modules/transforms.html
+++ b/v0.1.0/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.0/modules/utils.html b/v0.1.0/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.0/modules/utils.html
+++ b/v0.1.0/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.0/notebooks.html b/v0.1.0/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.0/notebooks.html
+++ b/v0.1.0/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.0/search.html b/v0.1.0/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.0/search.html
+++ b/v0.1.0/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.0/searchindex.js b/v0.1.0/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.0/searchindex.js
+++ b/v0.1.0/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.0/using_doctr/custom_models_training.html b/v0.1.0/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.0/using_doctr/custom_models_training.html
+++ b/v0.1.0/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.0/using_doctr/running_on_aws.html b/v0.1.0/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.0/using_doctr/running_on_aws.html
+++ b/v0.1.0/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.0/using_doctr/sharing_models.html b/v0.1.0/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.0/using_doctr/sharing_models.html
+++ b/v0.1.0/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.0/using_doctr/using_contrib_modules.html b/v0.1.0/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.0/using_doctr/using_contrib_modules.html
+++ b/v0.1.0/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.0/using_doctr/using_datasets.html b/v0.1.0/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.0/using_doctr/using_datasets.html
+++ b/v0.1.0/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.0/using_doctr/using_model_export.html b/v0.1.0/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.0/using_doctr/using_model_export.html
+++ b/v0.1.0/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.0/using_doctr/using_models.html b/v0.1.0/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.0/using_doctr/using_models.html
+++ b/v0.1.0/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/cord.html b/v0.1.1/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.1/_modules/doctr/datasets/cord.html
+++ b/v0.1.1/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/detection.html b/v0.1.1/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.1/_modules/doctr/datasets/detection.html
+++ b/v0.1.1/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/funsd.html b/v0.1.1/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.1/_modules/doctr/datasets/funsd.html
+++ b/v0.1.1/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic03.html b/v0.1.1/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.1/_modules/doctr/datasets/ic03.html
+++ b/v0.1.1/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic13.html b/v0.1.1/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.1/_modules/doctr/datasets/ic13.html
+++ b/v0.1.1/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiit5k.html b/v0.1.1/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.1/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.1/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiithws.html b/v0.1.1/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.1/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.1/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/imgur5k.html b/v0.1.1/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.1/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.1/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/loader.html b/v0.1.1/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.1/_modules/doctr/datasets/loader.html
+++ b/v0.1.1/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/mjsynth.html b/v0.1.1/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.1/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.1/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ocr.html b/v0.1.1/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.1/_modules/doctr/datasets/ocr.html
+++ b/v0.1.1/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/recognition.html b/v0.1.1/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.1/_modules/doctr/datasets/recognition.html
+++ b/v0.1.1/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/sroie.html b/v0.1.1/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.1/_modules/doctr/datasets/sroie.html
+++ b/v0.1.1/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svhn.html b/v0.1.1/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.1/_modules/doctr/datasets/svhn.html
+++ b/v0.1.1/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svt.html b/v0.1.1/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.1/_modules/doctr/datasets/svt.html
+++ b/v0.1.1/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/synthtext.html b/v0.1.1/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.1/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.1/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/utils.html b/v0.1.1/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.1/_modules/doctr/datasets/utils.html
+++ b/v0.1.1/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/wildreceipt.html b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.1/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.1/_modules/doctr/io/elements.html b/v0.1.1/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.1/_modules/doctr/io/elements.html
+++ b/v0.1.1/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.1/_modules/doctr/io/html.html b/v0.1.1/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.1/_modules/doctr/io/html.html
+++ b/v0.1.1/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/base.html b/v0.1.1/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.1/_modules/doctr/io/image/base.html
+++ b/v0.1.1/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/tensorflow.html b/v0.1.1/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.1/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.1/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/io/pdf.html b/v0.1.1/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.1/_modules/doctr/io/pdf.html
+++ b/v0.1.1/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.1/_modules/doctr/io/reader.html b/v0.1.1/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.1/_modules/doctr/io/reader.html
+++ b/v0.1.1/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/zoo.html b/v0.1.1/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.1/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.1/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/zoo.html b/v0.1.1/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.1/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.1/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/factory/hub.html b/v0.1.1/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.1/_modules/doctr/models/factory/hub.html
+++ b/v0.1.1/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/zoo.html b/v0.1.1/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.1/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.1/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/zoo.html b/v0.1.1/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.1/_modules/doctr/models/zoo.html
+++ b/v0.1.1/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/base.html b/v0.1.1/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/utils/metrics.html b/v0.1.1/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.1/_modules/doctr/utils/metrics.html
+++ b/v0.1.1/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.1/_modules/doctr/utils/visualization.html b/v0.1.1/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.1/_modules/doctr/utils/visualization.html
+++ b/v0.1.1/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.1/_modules/index.html b/v0.1.1/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.1/_modules/index.html
+++ b/v0.1.1/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.1/_sources/getting_started/installing.rst.txt b/v0.1.1/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.1/_sources/getting_started/installing.rst.txt
+++ b/v0.1.1/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.1/_static/basic.css b/v0.1.1/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.1/_static/basic.css
+++ b/v0.1.1/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.1/_static/doctools.js b/v0.1.1/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.1/_static/doctools.js
+++ b/v0.1.1/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.1/_static/language_data.js b/v0.1.1/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.1/_static/language_data.js
+++ b/v0.1.1/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.1/_static/searchtools.js b/v0.1.1/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.1/_static/searchtools.js
+++ b/v0.1.1/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.1/changelog.html b/v0.1.1/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.1/changelog.html
+++ b/v0.1.1/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.1/community/resources.html b/v0.1.1/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.1/community/resources.html
+++ b/v0.1.1/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.1/contributing/code_of_conduct.html b/v0.1.1/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.1/contributing/code_of_conduct.html
+++ b/v0.1.1/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.1/contributing/contributing.html b/v0.1.1/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.1/contributing/contributing.html
+++ b/v0.1.1/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.1/genindex.html b/v0.1.1/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.1/genindex.html
+++ b/v0.1.1/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.1/getting_started/installing.html b/v0.1.1/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.1/getting_started/installing.html
+++ b/v0.1.1/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.1/index.html b/v0.1.1/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.1/index.html
+++ b/v0.1.1/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.1/modules/contrib.html b/v0.1.1/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.1/modules/contrib.html
+++ b/v0.1.1/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.1/modules/datasets.html b/v0.1.1/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.1/modules/datasets.html
+++ b/v0.1.1/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.1/modules/io.html b/v0.1.1/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.1/modules/io.html
+++ b/v0.1.1/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.1/modules/models.html b/v0.1.1/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.1/modules/models.html
+++ b/v0.1.1/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.1/modules/transforms.html b/v0.1.1/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.1/modules/transforms.html
+++ b/v0.1.1/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
AWS Lambda
-
+
diff --git a/using_doctr/sharing_models.html b/using_doctr/sharing_models.html
index 07a3b2f2a3..4f5d1d68a5 100644
--- a/using_doctr/sharing_models.html
+++ b/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -540,7 +540,7 @@ Recognition
-
+
diff --git a/using_doctr/using_contrib_modules.html b/using_doctr/using_contrib_modules.html
index b4a10925e6..cf282ff3a4 100644
--- a/using_doctr/using_contrib_modules.html
+++ b/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -411,7 +411,7 @@ ArtefactDetection
-
+
diff --git a/using_doctr/using_datasets.html b/using_doctr/using_datasets.html
index 4a52df36ba..e30b6d6459 100644
--- a/using_doctr/using_datasets.html
+++ b/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -638,7 +638,7 @@ Data Loading
-
+
diff --git a/using_doctr/using_model_export.html b/using_doctr/using_model_export.html
index 2b30ee63a1..ad9d09ed4c 100644
--- a/using_doctr/using_model_export.html
+++ b/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -463,7 +463,7 @@ Using your ONNX exported model
-
+
diff --git a/using_doctr/using_models.html b/using_doctr/using_models.html
index 13cb06116b..5c80dbf62d 100644
--- a/using_doctr/using_models.html
+++ b/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1249,7 +1249,7 @@ Advanced options
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/cord.html b/v0.1.0/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.0/_modules/doctr/datasets/cord.html
+++ b/v0.1.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/detection.html b/v0.1.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.0/_modules/doctr/datasets/detection.html
+++ b/v0.1.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/doc_artefacts.html b/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/funsd.html b/v0.1.0/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.0/_modules/doctr/datasets/funsd.html
+++ b/v0.1.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ic03.html b/v0.1.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.0/_modules/doctr/datasets/ic03.html
+++ b/v0.1.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ic13.html b/v0.1.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.0/_modules/doctr/datasets/ic13.html
+++ b/v0.1.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/iiit5k.html b/v0.1.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/iiithws.html b/v0.1.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/imgur5k.html b/v0.1.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/loader.html b/v0.1.0/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.0/_modules/doctr/datasets/loader.html
+++ b/v0.1.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/mjsynth.html b/v0.1.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ocr.html b/v0.1.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.0/_modules/doctr/datasets/ocr.html
+++ b/v0.1.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/recognition.html b/v0.1.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.0/_modules/doctr/datasets/recognition.html
+++ b/v0.1.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/sroie.html b/v0.1.0/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.0/_modules/doctr/datasets/sroie.html
+++ b/v0.1.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/svhn.html b/v0.1.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.0/_modules/doctr/datasets/svhn.html
+++ b/v0.1.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/svt.html b/v0.1.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.0/_modules/doctr/datasets/svt.html
+++ b/v0.1.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/synthtext.html b/v0.1.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/utils.html b/v0.1.0/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.0/_modules/doctr/datasets/utils.html
+++ b/v0.1.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/wildreceipt.html b/v0.1.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.0/_modules/doctr/io/elements.html b/v0.1.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.0/_modules/doctr/io/elements.html
+++ b/v0.1.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.0/_modules/doctr/io/html.html b/v0.1.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.0/_modules/doctr/io/html.html
+++ b/v0.1.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.0/_modules/doctr/io/image/base.html b/v0.1.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.0/_modules/doctr/io/image/base.html
+++ b/v0.1.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.0/_modules/doctr/io/image/tensorflow.html b/v0.1.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/io/pdf.html b/v0.1.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.0/_modules/doctr/io/pdf.html
+++ b/v0.1.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.0/_modules/doctr/io/reader.html b/v0.1.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.0/_modules/doctr/io/reader.html
+++ b/v0.1.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/zoo.html b/v0.1.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/zoo.html b/v0.1.0/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/models/factory/hub.html b/v0.1.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.0/_modules/doctr/models/factory/hub.html
+++ b/v0.1.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/zoo.html b/v0.1.0/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/models/zoo.html b/v0.1.0/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.0/_modules/doctr/models/zoo.html
+++ b/v0.1.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/transforms/modules/base.html b/v0.1.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/utils/metrics.html b/v0.1.0/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.0/_modules/doctr/utils/metrics.html
+++ b/v0.1.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.0/_modules/doctr/utils/visualization.html b/v0.1.0/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.0/_modules/doctr/utils/visualization.html
+++ b/v0.1.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.0/_modules/index.html b/v0.1.0/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.0/_modules/index.html
+++ b/v0.1.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.0/_sources/getting_started/installing.rst.txt b/v0.1.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.0/_sources/getting_started/installing.rst.txt
+++ b/v0.1.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.0/_static/basic.css b/v0.1.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.0/_static/basic.css
+++ b/v0.1.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.0/_static/doctools.js b/v0.1.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.0/_static/doctools.js
+++ b/v0.1.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.0/_static/language_data.js b/v0.1.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.0/_static/language_data.js
+++ b/v0.1.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.0/_static/searchtools.js b/v0.1.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.0/_static/searchtools.js
+++ b/v0.1.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.0/changelog.html b/v0.1.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.0/changelog.html
+++ b/v0.1.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.0/community/resources.html b/v0.1.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.0/community/resources.html
+++ b/v0.1.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.0/contributing/code_of_conduct.html b/v0.1.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.0/contributing/code_of_conduct.html
+++ b/v0.1.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.0/contributing/contributing.html b/v0.1.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.0/contributing/contributing.html
+++ b/v0.1.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.0/genindex.html b/v0.1.0/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.0/genindex.html
+++ b/v0.1.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.0/getting_started/installing.html b/v0.1.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.0/getting_started/installing.html
+++ b/v0.1.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.0/index.html b/v0.1.0/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.0/index.html
+++ b/v0.1.0/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.0/modules/contrib.html b/v0.1.0/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.0/modules/contrib.html
+++ b/v0.1.0/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.0/modules/datasets.html b/v0.1.0/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.0/modules/datasets.html
+++ b/v0.1.0/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.0/modules/io.html b/v0.1.0/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.0/modules/io.html
+++ b/v0.1.0/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.0/modules/models.html b/v0.1.0/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.0/modules/models.html
+++ b/v0.1.0/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.0/modules/transforms.html b/v0.1.0/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.0/modules/transforms.html
+++ b/v0.1.0/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.0/modules/utils.html b/v0.1.0/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.0/modules/utils.html
+++ b/v0.1.0/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.0/notebooks.html b/v0.1.0/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.0/notebooks.html
+++ b/v0.1.0/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.0/search.html b/v0.1.0/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.0/search.html
+++ b/v0.1.0/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.0/searchindex.js b/v0.1.0/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.0/searchindex.js
+++ b/v0.1.0/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.0/using_doctr/custom_models_training.html b/v0.1.0/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.0/using_doctr/custom_models_training.html
+++ b/v0.1.0/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.0/using_doctr/running_on_aws.html b/v0.1.0/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.0/using_doctr/running_on_aws.html
+++ b/v0.1.0/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.0/using_doctr/sharing_models.html b/v0.1.0/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.0/using_doctr/sharing_models.html
+++ b/v0.1.0/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.0/using_doctr/using_contrib_modules.html b/v0.1.0/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.0/using_doctr/using_contrib_modules.html
+++ b/v0.1.0/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.0/using_doctr/using_datasets.html b/v0.1.0/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.0/using_doctr/using_datasets.html
+++ b/v0.1.0/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.0/using_doctr/using_model_export.html b/v0.1.0/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.0/using_doctr/using_model_export.html
+++ b/v0.1.0/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.0/using_doctr/using_models.html b/v0.1.0/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.0/using_doctr/using_models.html
+++ b/v0.1.0/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/cord.html b/v0.1.1/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.1/_modules/doctr/datasets/cord.html
+++ b/v0.1.1/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/detection.html b/v0.1.1/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.1/_modules/doctr/datasets/detection.html
+++ b/v0.1.1/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/funsd.html b/v0.1.1/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.1/_modules/doctr/datasets/funsd.html
+++ b/v0.1.1/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic03.html b/v0.1.1/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.1/_modules/doctr/datasets/ic03.html
+++ b/v0.1.1/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic13.html b/v0.1.1/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.1/_modules/doctr/datasets/ic13.html
+++ b/v0.1.1/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiit5k.html b/v0.1.1/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.1/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.1/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiithws.html b/v0.1.1/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.1/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.1/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/imgur5k.html b/v0.1.1/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.1/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.1/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/loader.html b/v0.1.1/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.1/_modules/doctr/datasets/loader.html
+++ b/v0.1.1/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/mjsynth.html b/v0.1.1/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.1/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.1/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ocr.html b/v0.1.1/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.1/_modules/doctr/datasets/ocr.html
+++ b/v0.1.1/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/recognition.html b/v0.1.1/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.1/_modules/doctr/datasets/recognition.html
+++ b/v0.1.1/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/sroie.html b/v0.1.1/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.1/_modules/doctr/datasets/sroie.html
+++ b/v0.1.1/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svhn.html b/v0.1.1/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.1/_modules/doctr/datasets/svhn.html
+++ b/v0.1.1/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svt.html b/v0.1.1/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.1/_modules/doctr/datasets/svt.html
+++ b/v0.1.1/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/synthtext.html b/v0.1.1/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.1/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.1/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/utils.html b/v0.1.1/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.1/_modules/doctr/datasets/utils.html
+++ b/v0.1.1/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/wildreceipt.html b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.1/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.1/_modules/doctr/io/elements.html b/v0.1.1/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.1/_modules/doctr/io/elements.html
+++ b/v0.1.1/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.1/_modules/doctr/io/html.html b/v0.1.1/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.1/_modules/doctr/io/html.html
+++ b/v0.1.1/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/base.html b/v0.1.1/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.1/_modules/doctr/io/image/base.html
+++ b/v0.1.1/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/tensorflow.html b/v0.1.1/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.1/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.1/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/io/pdf.html b/v0.1.1/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.1/_modules/doctr/io/pdf.html
+++ b/v0.1.1/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.1/_modules/doctr/io/reader.html b/v0.1.1/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.1/_modules/doctr/io/reader.html
+++ b/v0.1.1/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/zoo.html b/v0.1.1/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.1/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.1/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/zoo.html b/v0.1.1/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.1/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.1/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/factory/hub.html b/v0.1.1/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.1/_modules/doctr/models/factory/hub.html
+++ b/v0.1.1/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/zoo.html b/v0.1.1/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.1/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.1/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/zoo.html b/v0.1.1/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.1/_modules/doctr/models/zoo.html
+++ b/v0.1.1/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/base.html b/v0.1.1/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/utils/metrics.html b/v0.1.1/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.1/_modules/doctr/utils/metrics.html
+++ b/v0.1.1/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.1/_modules/doctr/utils/visualization.html b/v0.1.1/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.1/_modules/doctr/utils/visualization.html
+++ b/v0.1.1/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.1/_modules/index.html b/v0.1.1/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.1/_modules/index.html
+++ b/v0.1.1/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.1/_sources/getting_started/installing.rst.txt b/v0.1.1/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.1/_sources/getting_started/installing.rst.txt
+++ b/v0.1.1/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.1/_static/basic.css b/v0.1.1/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.1/_static/basic.css
+++ b/v0.1.1/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.1/_static/doctools.js b/v0.1.1/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.1/_static/doctools.js
+++ b/v0.1.1/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.1/_static/language_data.js b/v0.1.1/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.1/_static/language_data.js
+++ b/v0.1.1/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.1/_static/searchtools.js b/v0.1.1/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.1/_static/searchtools.js
+++ b/v0.1.1/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.1/changelog.html b/v0.1.1/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.1/changelog.html
+++ b/v0.1.1/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.1/community/resources.html b/v0.1.1/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.1/community/resources.html
+++ b/v0.1.1/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.1/contributing/code_of_conduct.html b/v0.1.1/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.1/contributing/code_of_conduct.html
+++ b/v0.1.1/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.1/contributing/contributing.html b/v0.1.1/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.1/contributing/contributing.html
+++ b/v0.1.1/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.1/genindex.html b/v0.1.1/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.1/genindex.html
+++ b/v0.1.1/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.1/getting_started/installing.html b/v0.1.1/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.1/getting_started/installing.html
+++ b/v0.1.1/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.1/index.html b/v0.1.1/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.1/index.html
+++ b/v0.1.1/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.1/modules/contrib.html b/v0.1.1/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.1/modules/contrib.html
+++ b/v0.1.1/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.1/modules/datasets.html b/v0.1.1/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.1/modules/datasets.html
+++ b/v0.1.1/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.1/modules/io.html b/v0.1.1/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.1/modules/io.html
+++ b/v0.1.1/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.1/modules/models.html b/v0.1.1/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.1/modules/models.html
+++ b/v0.1.1/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.1/modules/transforms.html b/v0.1.1/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.1/modules/transforms.html
+++ b/v0.1.1/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
ArtefactDetection
-
+
diff --git a/using_doctr/using_datasets.html b/using_doctr/using_datasets.html
index 4a52df36ba..e30b6d6459 100644
--- a/using_doctr/using_datasets.html
+++ b/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -638,7 +638,7 @@ Data Loading
-
+
diff --git a/using_doctr/using_model_export.html b/using_doctr/using_model_export.html
index 2b30ee63a1..ad9d09ed4c 100644
--- a/using_doctr/using_model_export.html
+++ b/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -463,7 +463,7 @@ Using your ONNX exported model
-
+
diff --git a/using_doctr/using_models.html b/using_doctr/using_models.html
index 13cb06116b..5c80dbf62d 100644
--- a/using_doctr/using_models.html
+++ b/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1249,7 +1249,7 @@ Advanced options
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/cord.html b/v0.1.0/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.0/_modules/doctr/datasets/cord.html
+++ b/v0.1.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/detection.html b/v0.1.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.0/_modules/doctr/datasets/detection.html
+++ b/v0.1.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/doc_artefacts.html b/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/funsd.html b/v0.1.0/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.0/_modules/doctr/datasets/funsd.html
+++ b/v0.1.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ic03.html b/v0.1.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.0/_modules/doctr/datasets/ic03.html
+++ b/v0.1.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ic13.html b/v0.1.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.0/_modules/doctr/datasets/ic13.html
+++ b/v0.1.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/iiit5k.html b/v0.1.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/iiithws.html b/v0.1.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/imgur5k.html b/v0.1.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/loader.html b/v0.1.0/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.0/_modules/doctr/datasets/loader.html
+++ b/v0.1.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/mjsynth.html b/v0.1.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ocr.html b/v0.1.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.0/_modules/doctr/datasets/ocr.html
+++ b/v0.1.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/recognition.html b/v0.1.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.0/_modules/doctr/datasets/recognition.html
+++ b/v0.1.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/sroie.html b/v0.1.0/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.0/_modules/doctr/datasets/sroie.html
+++ b/v0.1.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/svhn.html b/v0.1.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.0/_modules/doctr/datasets/svhn.html
+++ b/v0.1.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/svt.html b/v0.1.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.0/_modules/doctr/datasets/svt.html
+++ b/v0.1.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/synthtext.html b/v0.1.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/utils.html b/v0.1.0/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.0/_modules/doctr/datasets/utils.html
+++ b/v0.1.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/wildreceipt.html b/v0.1.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.0/_modules/doctr/io/elements.html b/v0.1.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.0/_modules/doctr/io/elements.html
+++ b/v0.1.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.0/_modules/doctr/io/html.html b/v0.1.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.0/_modules/doctr/io/html.html
+++ b/v0.1.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.0/_modules/doctr/io/image/base.html b/v0.1.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.0/_modules/doctr/io/image/base.html
+++ b/v0.1.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.0/_modules/doctr/io/image/tensorflow.html b/v0.1.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/io/pdf.html b/v0.1.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.0/_modules/doctr/io/pdf.html
+++ b/v0.1.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.0/_modules/doctr/io/reader.html b/v0.1.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.0/_modules/doctr/io/reader.html
+++ b/v0.1.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/zoo.html b/v0.1.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/zoo.html b/v0.1.0/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/models/factory/hub.html b/v0.1.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.0/_modules/doctr/models/factory/hub.html
+++ b/v0.1.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/zoo.html b/v0.1.0/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/models/zoo.html b/v0.1.0/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.0/_modules/doctr/models/zoo.html
+++ b/v0.1.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/transforms/modules/base.html b/v0.1.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/utils/metrics.html b/v0.1.0/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.0/_modules/doctr/utils/metrics.html
+++ b/v0.1.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.0/_modules/doctr/utils/visualization.html b/v0.1.0/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.0/_modules/doctr/utils/visualization.html
+++ b/v0.1.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.0/_modules/index.html b/v0.1.0/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.0/_modules/index.html
+++ b/v0.1.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.0/_sources/getting_started/installing.rst.txt b/v0.1.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.0/_sources/getting_started/installing.rst.txt
+++ b/v0.1.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.0/_static/basic.css b/v0.1.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.0/_static/basic.css
+++ b/v0.1.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.0/_static/doctools.js b/v0.1.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.0/_static/doctools.js
+++ b/v0.1.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.0/_static/language_data.js b/v0.1.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.0/_static/language_data.js
+++ b/v0.1.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.0/_static/searchtools.js b/v0.1.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.0/_static/searchtools.js
+++ b/v0.1.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.0/changelog.html b/v0.1.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.0/changelog.html
+++ b/v0.1.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.0/community/resources.html b/v0.1.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.0/community/resources.html
+++ b/v0.1.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.0/contributing/code_of_conduct.html b/v0.1.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.0/contributing/code_of_conduct.html
+++ b/v0.1.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.0/contributing/contributing.html b/v0.1.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.0/contributing/contributing.html
+++ b/v0.1.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.0/genindex.html b/v0.1.0/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.0/genindex.html
+++ b/v0.1.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.0/getting_started/installing.html b/v0.1.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.0/getting_started/installing.html
+++ b/v0.1.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.0/index.html b/v0.1.0/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.0/index.html
+++ b/v0.1.0/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.0/modules/contrib.html b/v0.1.0/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.0/modules/contrib.html
+++ b/v0.1.0/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.0/modules/datasets.html b/v0.1.0/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.0/modules/datasets.html
+++ b/v0.1.0/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.0/modules/io.html b/v0.1.0/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.0/modules/io.html
+++ b/v0.1.0/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.0/modules/models.html b/v0.1.0/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.0/modules/models.html
+++ b/v0.1.0/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.0/modules/transforms.html b/v0.1.0/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.0/modules/transforms.html
+++ b/v0.1.0/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.0/modules/utils.html b/v0.1.0/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.0/modules/utils.html
+++ b/v0.1.0/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.0/notebooks.html b/v0.1.0/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.0/notebooks.html
+++ b/v0.1.0/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.0/search.html b/v0.1.0/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.0/search.html
+++ b/v0.1.0/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.0/searchindex.js b/v0.1.0/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.0/searchindex.js
+++ b/v0.1.0/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.0/using_doctr/custom_models_training.html b/v0.1.0/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.0/using_doctr/custom_models_training.html
+++ b/v0.1.0/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.0/using_doctr/running_on_aws.html b/v0.1.0/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.0/using_doctr/running_on_aws.html
+++ b/v0.1.0/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.0/using_doctr/sharing_models.html b/v0.1.0/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.0/using_doctr/sharing_models.html
+++ b/v0.1.0/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.0/using_doctr/using_contrib_modules.html b/v0.1.0/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.0/using_doctr/using_contrib_modules.html
+++ b/v0.1.0/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.0/using_doctr/using_datasets.html b/v0.1.0/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.0/using_doctr/using_datasets.html
+++ b/v0.1.0/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.0/using_doctr/using_model_export.html b/v0.1.0/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.0/using_doctr/using_model_export.html
+++ b/v0.1.0/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.0/using_doctr/using_models.html b/v0.1.0/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.0/using_doctr/using_models.html
+++ b/v0.1.0/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/cord.html b/v0.1.1/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.1/_modules/doctr/datasets/cord.html
+++ b/v0.1.1/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/detection.html b/v0.1.1/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.1/_modules/doctr/datasets/detection.html
+++ b/v0.1.1/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/funsd.html b/v0.1.1/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.1/_modules/doctr/datasets/funsd.html
+++ b/v0.1.1/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic03.html b/v0.1.1/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.1/_modules/doctr/datasets/ic03.html
+++ b/v0.1.1/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic13.html b/v0.1.1/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.1/_modules/doctr/datasets/ic13.html
+++ b/v0.1.1/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiit5k.html b/v0.1.1/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.1/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.1/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiithws.html b/v0.1.1/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.1/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.1/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/imgur5k.html b/v0.1.1/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.1/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.1/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/loader.html b/v0.1.1/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.1/_modules/doctr/datasets/loader.html
+++ b/v0.1.1/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/mjsynth.html b/v0.1.1/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.1/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.1/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ocr.html b/v0.1.1/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.1/_modules/doctr/datasets/ocr.html
+++ b/v0.1.1/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/recognition.html b/v0.1.1/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.1/_modules/doctr/datasets/recognition.html
+++ b/v0.1.1/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/sroie.html b/v0.1.1/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.1/_modules/doctr/datasets/sroie.html
+++ b/v0.1.1/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svhn.html b/v0.1.1/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.1/_modules/doctr/datasets/svhn.html
+++ b/v0.1.1/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svt.html b/v0.1.1/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.1/_modules/doctr/datasets/svt.html
+++ b/v0.1.1/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/synthtext.html b/v0.1.1/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.1/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.1/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/utils.html b/v0.1.1/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.1/_modules/doctr/datasets/utils.html
+++ b/v0.1.1/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/wildreceipt.html b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.1/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.1/_modules/doctr/io/elements.html b/v0.1.1/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.1/_modules/doctr/io/elements.html
+++ b/v0.1.1/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.1/_modules/doctr/io/html.html b/v0.1.1/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.1/_modules/doctr/io/html.html
+++ b/v0.1.1/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/base.html b/v0.1.1/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.1/_modules/doctr/io/image/base.html
+++ b/v0.1.1/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/tensorflow.html b/v0.1.1/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.1/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.1/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/io/pdf.html b/v0.1.1/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.1/_modules/doctr/io/pdf.html
+++ b/v0.1.1/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.1/_modules/doctr/io/reader.html b/v0.1.1/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.1/_modules/doctr/io/reader.html
+++ b/v0.1.1/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/zoo.html b/v0.1.1/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.1/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.1/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/zoo.html b/v0.1.1/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.1/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.1/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/factory/hub.html b/v0.1.1/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.1/_modules/doctr/models/factory/hub.html
+++ b/v0.1.1/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/zoo.html b/v0.1.1/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.1/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.1/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/zoo.html b/v0.1.1/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.1/_modules/doctr/models/zoo.html
+++ b/v0.1.1/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/base.html b/v0.1.1/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/utils/metrics.html b/v0.1.1/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.1/_modules/doctr/utils/metrics.html
+++ b/v0.1.1/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.1/_modules/doctr/utils/visualization.html b/v0.1.1/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.1/_modules/doctr/utils/visualization.html
+++ b/v0.1.1/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.1/_modules/index.html b/v0.1.1/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.1/_modules/index.html
+++ b/v0.1.1/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.1/_sources/getting_started/installing.rst.txt b/v0.1.1/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.1/_sources/getting_started/installing.rst.txt
+++ b/v0.1.1/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.1/_static/basic.css b/v0.1.1/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.1/_static/basic.css
+++ b/v0.1.1/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.1/_static/doctools.js b/v0.1.1/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.1/_static/doctools.js
+++ b/v0.1.1/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.1/_static/language_data.js b/v0.1.1/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.1/_static/language_data.js
+++ b/v0.1.1/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.1/_static/searchtools.js b/v0.1.1/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.1/_static/searchtools.js
+++ b/v0.1.1/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.1/changelog.html b/v0.1.1/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.1/changelog.html
+++ b/v0.1.1/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.1/community/resources.html b/v0.1.1/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.1/community/resources.html
+++ b/v0.1.1/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.1/contributing/code_of_conduct.html b/v0.1.1/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.1/contributing/code_of_conduct.html
+++ b/v0.1.1/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.1/contributing/contributing.html b/v0.1.1/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.1/contributing/contributing.html
+++ b/v0.1.1/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.1/genindex.html b/v0.1.1/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.1/genindex.html
+++ b/v0.1.1/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.1/getting_started/installing.html b/v0.1.1/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.1/getting_started/installing.html
+++ b/v0.1.1/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.1/index.html b/v0.1.1/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.1/index.html
+++ b/v0.1.1/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.1/modules/contrib.html b/v0.1.1/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.1/modules/contrib.html
+++ b/v0.1.1/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.1/modules/datasets.html b/v0.1.1/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.1/modules/datasets.html
+++ b/v0.1.1/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.1/modules/io.html b/v0.1.1/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.1/modules/io.html
+++ b/v0.1.1/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.1/modules/models.html b/v0.1.1/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.1/modules/models.html
+++ b/v0.1.1/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.1/modules/transforms.html b/v0.1.1/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.1/modules/transforms.html
+++ b/v0.1.1/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
Using your ONNX exported model
-
+
diff --git a/using_doctr/using_models.html b/using_doctr/using_models.html
index 13cb06116b..5c80dbf62d 100644
--- a/using_doctr/using_models.html
+++ b/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1249,7 +1249,7 @@ Advanced options
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/cord.html b/v0.1.0/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.0/_modules/doctr/datasets/cord.html
+++ b/v0.1.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/detection.html b/v0.1.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.0/_modules/doctr/datasets/detection.html
+++ b/v0.1.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/doc_artefacts.html b/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/funsd.html b/v0.1.0/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.0/_modules/doctr/datasets/funsd.html
+++ b/v0.1.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ic03.html b/v0.1.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.0/_modules/doctr/datasets/ic03.html
+++ b/v0.1.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ic13.html b/v0.1.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.0/_modules/doctr/datasets/ic13.html
+++ b/v0.1.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/iiit5k.html b/v0.1.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/iiithws.html b/v0.1.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/imgur5k.html b/v0.1.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/loader.html b/v0.1.0/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.0/_modules/doctr/datasets/loader.html
+++ b/v0.1.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/mjsynth.html b/v0.1.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/ocr.html b/v0.1.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.0/_modules/doctr/datasets/ocr.html
+++ b/v0.1.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/recognition.html b/v0.1.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.0/_modules/doctr/datasets/recognition.html
+++ b/v0.1.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/sroie.html b/v0.1.0/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.0/_modules/doctr/datasets/sroie.html
+++ b/v0.1.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/svhn.html b/v0.1.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.0/_modules/doctr/datasets/svhn.html
+++ b/v0.1.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/svt.html b/v0.1.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.0/_modules/doctr/datasets/svt.html
+++ b/v0.1.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/synthtext.html b/v0.1.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/utils.html b/v0.1.0/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.0/_modules/doctr/datasets/utils.html
+++ b/v0.1.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.0/_modules/doctr/datasets/wildreceipt.html b/v0.1.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.0/_modules/doctr/io/elements.html b/v0.1.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.0/_modules/doctr/io/elements.html
+++ b/v0.1.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.0/_modules/doctr/io/html.html b/v0.1.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.0/_modules/doctr/io/html.html
+++ b/v0.1.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.0/_modules/doctr/io/image/base.html b/v0.1.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.0/_modules/doctr/io/image/base.html
+++ b/v0.1.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.0/_modules/doctr/io/image/tensorflow.html b/v0.1.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/io/pdf.html b/v0.1.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.0/_modules/doctr/io/pdf.html
+++ b/v0.1.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.0/_modules/doctr/io/reader.html b/v0.1.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.0/_modules/doctr/io/reader.html
+++ b/v0.1.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/classification/zoo.html b/v0.1.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/detection/zoo.html b/v0.1.0/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/models/factory/hub.html b/v0.1.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.0/_modules/doctr/models/factory/hub.html
+++ b/v0.1.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/models/recognition/zoo.html b/v0.1.0/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/models/zoo.html b/v0.1.0/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.0/_modules/doctr/models/zoo.html
+++ b/v0.1.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.0/_modules/doctr/transforms/modules/base.html b/v0.1.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.0/_modules/doctr/utils/metrics.html b/v0.1.0/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.0/_modules/doctr/utils/metrics.html
+++ b/v0.1.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.0/_modules/doctr/utils/visualization.html b/v0.1.0/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.0/_modules/doctr/utils/visualization.html
+++ b/v0.1.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.0/_modules/index.html b/v0.1.0/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.0/_modules/index.html
+++ b/v0.1.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.0/_sources/getting_started/installing.rst.txt b/v0.1.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.0/_sources/getting_started/installing.rst.txt
+++ b/v0.1.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.0/_static/basic.css b/v0.1.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.0/_static/basic.css
+++ b/v0.1.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.0/_static/doctools.js b/v0.1.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.0/_static/doctools.js
+++ b/v0.1.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.0/_static/language_data.js b/v0.1.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.0/_static/language_data.js
+++ b/v0.1.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.0/_static/searchtools.js b/v0.1.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.0/_static/searchtools.js
+++ b/v0.1.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.0/changelog.html b/v0.1.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.0/changelog.html
+++ b/v0.1.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.0/community/resources.html b/v0.1.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.0/community/resources.html
+++ b/v0.1.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.0/contributing/code_of_conduct.html b/v0.1.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.0/contributing/code_of_conduct.html
+++ b/v0.1.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.0/contributing/contributing.html b/v0.1.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.0/contributing/contributing.html
+++ b/v0.1.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.0/genindex.html b/v0.1.0/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.0/genindex.html
+++ b/v0.1.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.0/getting_started/installing.html b/v0.1.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.0/getting_started/installing.html
+++ b/v0.1.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.0/index.html b/v0.1.0/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.0/index.html
+++ b/v0.1.0/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.0/modules/contrib.html b/v0.1.0/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.0/modules/contrib.html
+++ b/v0.1.0/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.0/modules/datasets.html b/v0.1.0/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.0/modules/datasets.html
+++ b/v0.1.0/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.0/modules/io.html b/v0.1.0/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.0/modules/io.html
+++ b/v0.1.0/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.0/modules/models.html b/v0.1.0/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.0/modules/models.html
+++ b/v0.1.0/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.0/modules/transforms.html b/v0.1.0/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.0/modules/transforms.html
+++ b/v0.1.0/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.0/modules/utils.html b/v0.1.0/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.0/modules/utils.html
+++ b/v0.1.0/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.0/notebooks.html b/v0.1.0/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.0/notebooks.html
+++ b/v0.1.0/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.0/search.html b/v0.1.0/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.0/search.html
+++ b/v0.1.0/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.0/searchindex.js b/v0.1.0/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.0/searchindex.js
+++ b/v0.1.0/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.0/using_doctr/custom_models_training.html b/v0.1.0/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.0/using_doctr/custom_models_training.html
+++ b/v0.1.0/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.0/using_doctr/running_on_aws.html b/v0.1.0/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.0/using_doctr/running_on_aws.html
+++ b/v0.1.0/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.0/using_doctr/sharing_models.html b/v0.1.0/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.0/using_doctr/sharing_models.html
+++ b/v0.1.0/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.0/using_doctr/using_contrib_modules.html b/v0.1.0/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.0/using_doctr/using_contrib_modules.html
+++ b/v0.1.0/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.0/using_doctr/using_datasets.html b/v0.1.0/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.0/using_doctr/using_datasets.html
+++ b/v0.1.0/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.0/using_doctr/using_model_export.html b/v0.1.0/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.0/using_doctr/using_model_export.html
+++ b/v0.1.0/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.0/using_doctr/using_models.html b/v0.1.0/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.0/using_doctr/using_models.html
+++ b/v0.1.0/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/cord.html b/v0.1.1/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.1/_modules/doctr/datasets/cord.html
+++ b/v0.1.1/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/detection.html b/v0.1.1/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.1/_modules/doctr/datasets/detection.html
+++ b/v0.1.1/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/funsd.html b/v0.1.1/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.1/_modules/doctr/datasets/funsd.html
+++ b/v0.1.1/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic03.html b/v0.1.1/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.1/_modules/doctr/datasets/ic03.html
+++ b/v0.1.1/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic13.html b/v0.1.1/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.1/_modules/doctr/datasets/ic13.html
+++ b/v0.1.1/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiit5k.html b/v0.1.1/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.1/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.1/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiithws.html b/v0.1.1/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.1/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.1/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/imgur5k.html b/v0.1.1/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.1/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.1/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/loader.html b/v0.1.1/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.1/_modules/doctr/datasets/loader.html
+++ b/v0.1.1/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/mjsynth.html b/v0.1.1/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.1/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.1/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ocr.html b/v0.1.1/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.1/_modules/doctr/datasets/ocr.html
+++ b/v0.1.1/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/recognition.html b/v0.1.1/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.1/_modules/doctr/datasets/recognition.html
+++ b/v0.1.1/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/sroie.html b/v0.1.1/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.1/_modules/doctr/datasets/sroie.html
+++ b/v0.1.1/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svhn.html b/v0.1.1/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.1/_modules/doctr/datasets/svhn.html
+++ b/v0.1.1/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svt.html b/v0.1.1/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.1/_modules/doctr/datasets/svt.html
+++ b/v0.1.1/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/synthtext.html b/v0.1.1/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.1/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.1/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/utils.html b/v0.1.1/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.1/_modules/doctr/datasets/utils.html
+++ b/v0.1.1/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/wildreceipt.html b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.1/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.1/_modules/doctr/io/elements.html b/v0.1.1/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.1/_modules/doctr/io/elements.html
+++ b/v0.1.1/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.1/_modules/doctr/io/html.html b/v0.1.1/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.1/_modules/doctr/io/html.html
+++ b/v0.1.1/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/base.html b/v0.1.1/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.1/_modules/doctr/io/image/base.html
+++ b/v0.1.1/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/tensorflow.html b/v0.1.1/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.1/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.1/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/io/pdf.html b/v0.1.1/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.1/_modules/doctr/io/pdf.html
+++ b/v0.1.1/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.1/_modules/doctr/io/reader.html b/v0.1.1/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.1/_modules/doctr/io/reader.html
+++ b/v0.1.1/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/zoo.html b/v0.1.1/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.1/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.1/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/zoo.html b/v0.1.1/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.1/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.1/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/factory/hub.html b/v0.1.1/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.1/_modules/doctr/models/factory/hub.html
+++ b/v0.1.1/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/zoo.html b/v0.1.1/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.1/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.1/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/zoo.html b/v0.1.1/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.1/_modules/doctr/models/zoo.html
+++ b/v0.1.1/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/base.html b/v0.1.1/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/utils/metrics.html b/v0.1.1/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.1/_modules/doctr/utils/metrics.html
+++ b/v0.1.1/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.1/_modules/doctr/utils/visualization.html b/v0.1.1/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.1/_modules/doctr/utils/visualization.html
+++ b/v0.1.1/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.1/_modules/index.html b/v0.1.1/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.1/_modules/index.html
+++ b/v0.1.1/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.1/_sources/getting_started/installing.rst.txt b/v0.1.1/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.1/_sources/getting_started/installing.rst.txt
+++ b/v0.1.1/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.1/_static/basic.css b/v0.1.1/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.1/_static/basic.css
+++ b/v0.1.1/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.1/_static/doctools.js b/v0.1.1/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.1/_static/doctools.js
+++ b/v0.1.1/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.1/_static/language_data.js b/v0.1.1/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.1/_static/language_data.js
+++ b/v0.1.1/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.1/_static/searchtools.js b/v0.1.1/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.1/_static/searchtools.js
+++ b/v0.1.1/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.1/changelog.html b/v0.1.1/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.1/changelog.html
+++ b/v0.1.1/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.1/community/resources.html b/v0.1.1/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.1/community/resources.html
+++ b/v0.1.1/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.1/contributing/code_of_conduct.html b/v0.1.1/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.1/contributing/code_of_conduct.html
+++ b/v0.1.1/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.1/contributing/contributing.html b/v0.1.1/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.1/contributing/contributing.html
+++ b/v0.1.1/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.1/genindex.html b/v0.1.1/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.1/genindex.html
+++ b/v0.1.1/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.1/getting_started/installing.html b/v0.1.1/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.1/getting_started/installing.html
+++ b/v0.1.1/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.1/index.html b/v0.1.1/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.1/index.html
+++ b/v0.1.1/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.1/modules/contrib.html b/v0.1.1/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.1/modules/contrib.html
+++ b/v0.1.1/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.1/modules/datasets.html b/v0.1.1/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.1/modules/datasets.html
+++ b/v0.1.1/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.1/modules/io.html b/v0.1.1/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.1/modules/io.html
+++ b/v0.1.1/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.1/modules/models.html b/v0.1.1/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.1/modules/models.html
+++ b/v0.1.1/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.1/modules/transforms.html b/v0.1.1/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.1/modules/transforms.html
+++ b/v0.1.1/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
Source code for doctr.datasets.cord
Source code for doctr.datasets.detection
Source code for doctr.datasets.doc_artefacts
Source code for doctr.datasets.funsd
Source code for doctr.datasets.generator.tensorflow
Source code for doctr.datasets.ic03
Source code for doctr.datasets.ic13
Source code for doctr.datasets.iiit5k
Source code for doctr.datasets.iiithws
Source code for doctr.datasets.imgur5k
Source code for doctr.datasets.loader
Source code for doctr.datasets.mjsynth
Source code for doctr.datasets.ocr
Source code for doctr.datasets.recognition
Source code for doctr.datasets.sroie
Source code for doctr.datasets.svhn
Source code for doctr.datasets.svt
Source code for doctr.datasets.synthtext
Source code for doctr.datasets.utils
Source code for doctr.datasets.wildreceipt
Source code for doctr.io.elements
Source code for doctr.io.html
Source code for doctr.io.image.base
Source code for doctr.io.image.tensorflow
Source code for doctr.io.pdf
Source code for doctr.io.reader
Source code for doctr.models.classification.magc_resnet.tensorflow
Source code for doctr.models.classification.mobilenet.tensorflow
Source code for doctr.models.classification.resnet.tensorflow
Source code for doctr.models.classification.textnet.tensorflow
Source code for doctr.models.classification.vgg.tensorflow
Source code for doctr.models.classification.vit.tensorflow
Source code for doctr.models.classification.zoo
Source code for doctr.models.detection.differentiable_binarization.tensorflo
Source code for doctr.models.detection.fast.tensorflow
Source code for doctr.models.detection.linknet.tensorflow
Source code for doctr.models.detection.zoo
Source code for doctr.models.factory.hub
Source code for doctr.models.recognition.crnn.tensorflow
Source code for doctr.models.recognition.master.tensorflow
Source code for doctr.models.recognition.parseq.tensorflow
Source code for doctr.models.recognition.sar.tensorflow
Source code for doctr.models.recognition.vitstr.tensorflow
Source code for doctr.models.recognition.zoo
Source code for doctr.models.zoo
Source code for doctr.transforms.modules.base
Source code for doctr.transforms.modules.tensorflow
Source code for doctr.utils.metrics
Source code for doctr.utils.visualization
All modules for which code is available
v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.0/community/resources.html b/v0.1.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.0/community/resources.html
+++ b/v0.1.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.0/contributing/code_of_conduct.html b/v0.1.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.0/contributing/code_of_conduct.html
+++ b/v0.1.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.0/contributing/contributing.html b/v0.1.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.0/contributing/contributing.html
+++ b/v0.1.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.0/genindex.html b/v0.1.0/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.0/genindex.html
+++ b/v0.1.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.0/getting_started/installing.html b/v0.1.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.0/getting_started/installing.html
+++ b/v0.1.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.0/index.html b/v0.1.0/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.0/index.html
+++ b/v0.1.0/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.0/modules/contrib.html b/v0.1.0/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.0/modules/contrib.html
+++ b/v0.1.0/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.0/modules/datasets.html b/v0.1.0/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.0/modules/datasets.html
+++ b/v0.1.0/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.0/modules/io.html b/v0.1.0/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.0/modules/io.html
+++ b/v0.1.0/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.0/modules/models.html b/v0.1.0/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.0/modules/models.html
+++ b/v0.1.0/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.0/modules/transforms.html b/v0.1.0/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.0/modules/transforms.html
+++ b/v0.1.0/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.0/modules/utils.html b/v0.1.0/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.0/modules/utils.html
+++ b/v0.1.0/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.0/notebooks.html b/v0.1.0/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.0/notebooks.html
+++ b/v0.1.0/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.0/search.html b/v0.1.0/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.0/search.html
+++ b/v0.1.0/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.0/searchindex.js b/v0.1.0/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.0/searchindex.js
+++ b/v0.1.0/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.0/using_doctr/custom_models_training.html b/v0.1.0/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.0/using_doctr/custom_models_training.html
+++ b/v0.1.0/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.0/using_doctr/running_on_aws.html b/v0.1.0/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.0/using_doctr/running_on_aws.html
+++ b/v0.1.0/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.0/using_doctr/sharing_models.html b/v0.1.0/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.0/using_doctr/sharing_models.html
+++ b/v0.1.0/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.0/using_doctr/using_contrib_modules.html b/v0.1.0/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.0/using_doctr/using_contrib_modules.html
+++ b/v0.1.0/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.0/using_doctr/using_datasets.html b/v0.1.0/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.0/using_doctr/using_datasets.html
+++ b/v0.1.0/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.0/using_doctr/using_model_export.html b/v0.1.0/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.0/using_doctr/using_model_export.html
+++ b/v0.1.0/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.0/using_doctr/using_models.html b/v0.1.0/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.0/using_doctr/using_models.html
+++ b/v0.1.0/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/cord.html b/v0.1.1/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.1/_modules/doctr/datasets/cord.html
+++ b/v0.1.1/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/detection.html b/v0.1.1/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.1/_modules/doctr/datasets/detection.html
+++ b/v0.1.1/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/funsd.html b/v0.1.1/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.1/_modules/doctr/datasets/funsd.html
+++ b/v0.1.1/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic03.html b/v0.1.1/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.1/_modules/doctr/datasets/ic03.html
+++ b/v0.1.1/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic13.html b/v0.1.1/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.1/_modules/doctr/datasets/ic13.html
+++ b/v0.1.1/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiit5k.html b/v0.1.1/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.1/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.1/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiithws.html b/v0.1.1/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.1/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.1/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/imgur5k.html b/v0.1.1/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.1/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.1/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/loader.html b/v0.1.1/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.1/_modules/doctr/datasets/loader.html
+++ b/v0.1.1/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/mjsynth.html b/v0.1.1/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.1/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.1/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ocr.html b/v0.1.1/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.1/_modules/doctr/datasets/ocr.html
+++ b/v0.1.1/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/recognition.html b/v0.1.1/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.1/_modules/doctr/datasets/recognition.html
+++ b/v0.1.1/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/sroie.html b/v0.1.1/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.1/_modules/doctr/datasets/sroie.html
+++ b/v0.1.1/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svhn.html b/v0.1.1/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.1/_modules/doctr/datasets/svhn.html
+++ b/v0.1.1/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svt.html b/v0.1.1/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.1/_modules/doctr/datasets/svt.html
+++ b/v0.1.1/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/synthtext.html b/v0.1.1/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.1/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.1/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/utils.html b/v0.1.1/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.1/_modules/doctr/datasets/utils.html
+++ b/v0.1.1/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/wildreceipt.html b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.1/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.1/_modules/doctr/io/elements.html b/v0.1.1/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.1/_modules/doctr/io/elements.html
+++ b/v0.1.1/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.1/_modules/doctr/io/html.html b/v0.1.1/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.1/_modules/doctr/io/html.html
+++ b/v0.1.1/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/base.html b/v0.1.1/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.1/_modules/doctr/io/image/base.html
+++ b/v0.1.1/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/tensorflow.html b/v0.1.1/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.1/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.1/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/io/pdf.html b/v0.1.1/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.1/_modules/doctr/io/pdf.html
+++ b/v0.1.1/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.1/_modules/doctr/io/reader.html b/v0.1.1/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.1/_modules/doctr/io/reader.html
+++ b/v0.1.1/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/zoo.html b/v0.1.1/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.1/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.1/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/zoo.html b/v0.1.1/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.1/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.1/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/factory/hub.html b/v0.1.1/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.1/_modules/doctr/models/factory/hub.html
+++ b/v0.1.1/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/zoo.html b/v0.1.1/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.1/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.1/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/zoo.html b/v0.1.1/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.1/_modules/doctr/models/zoo.html
+++ b/v0.1.1/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/base.html b/v0.1.1/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/utils/metrics.html b/v0.1.1/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.1/_modules/doctr/utils/metrics.html
+++ b/v0.1.1/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.1/_modules/doctr/utils/visualization.html b/v0.1.1/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.1/_modules/doctr/utils/visualization.html
+++ b/v0.1.1/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.1/_modules/index.html b/v0.1.1/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.1/_modules/index.html
+++ b/v0.1.1/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.1/_sources/getting_started/installing.rst.txt b/v0.1.1/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.1/_sources/getting_started/installing.rst.txt
+++ b/v0.1.1/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.1/_static/basic.css b/v0.1.1/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.1/_static/basic.css
+++ b/v0.1.1/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.1/_static/doctools.js b/v0.1.1/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.1/_static/doctools.js
+++ b/v0.1.1/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.1/_static/language_data.js b/v0.1.1/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.1/_static/language_data.js
+++ b/v0.1.1/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.1/_static/searchtools.js b/v0.1.1/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.1/_static/searchtools.js
+++ b/v0.1.1/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.1/changelog.html b/v0.1.1/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.1/changelog.html
+++ b/v0.1.1/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.1/community/resources.html b/v0.1.1/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.1/community/resources.html
+++ b/v0.1.1/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.1/contributing/code_of_conduct.html b/v0.1.1/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.1/contributing/code_of_conduct.html
+++ b/v0.1.1/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.1/contributing/contributing.html b/v0.1.1/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.1/contributing/contributing.html
+++ b/v0.1.1/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.1/genindex.html b/v0.1.1/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.1/genindex.html
+++ b/v0.1.1/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.1/getting_started/installing.html b/v0.1.1/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.1/getting_started/installing.html
+++ b/v0.1.1/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.1/index.html b/v0.1.1/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.1/index.html
+++ b/v0.1.1/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.1/modules/contrib.html b/v0.1.1/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.1/modules/contrib.html
+++ b/v0.1.1/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.1/modules/datasets.html b/v0.1.1/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.1/modules/datasets.html
+++ b/v0.1.1/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.1/modules/io.html b/v0.1.1/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.1/modules/io.html
+++ b/v0.1.1/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.1/modules/models.html b/v0.1.1/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.1/modules/models.html
+++ b/v0.1.1/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.1/modules/transforms.html b/v0.1.1/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.1/modules/transforms.html
+++ b/v0.1.1/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
Attribution
-
+
diff --git a/v0.1.0/contributing/contributing.html b/v0.1.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.0/contributing/contributing.html
+++ b/v0.1.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.0/genindex.html b/v0.1.0/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.0/genindex.html
+++ b/v0.1.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.0/getting_started/installing.html b/v0.1.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.0/getting_started/installing.html
+++ b/v0.1.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.0/index.html b/v0.1.0/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.0/index.html
+++ b/v0.1.0/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.0/modules/contrib.html b/v0.1.0/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.0/modules/contrib.html
+++ b/v0.1.0/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.0/modules/datasets.html b/v0.1.0/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.0/modules/datasets.html
+++ b/v0.1.0/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.0/modules/io.html b/v0.1.0/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.0/modules/io.html
+++ b/v0.1.0/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.0/modules/models.html b/v0.1.0/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.0/modules/models.html
+++ b/v0.1.0/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.0/modules/transforms.html b/v0.1.0/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.0/modules/transforms.html
+++ b/v0.1.0/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.0/modules/utils.html b/v0.1.0/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.0/modules/utils.html
+++ b/v0.1.0/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.0/notebooks.html b/v0.1.0/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.0/notebooks.html
+++ b/v0.1.0/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.0/search.html b/v0.1.0/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.0/search.html
+++ b/v0.1.0/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.0/searchindex.js b/v0.1.0/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.0/searchindex.js
+++ b/v0.1.0/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.0/using_doctr/custom_models_training.html b/v0.1.0/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.0/using_doctr/custom_models_training.html
+++ b/v0.1.0/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.0/using_doctr/running_on_aws.html b/v0.1.0/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.0/using_doctr/running_on_aws.html
+++ b/v0.1.0/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.0/using_doctr/sharing_models.html b/v0.1.0/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.0/using_doctr/sharing_models.html
+++ b/v0.1.0/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.0/using_doctr/using_contrib_modules.html b/v0.1.0/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.0/using_doctr/using_contrib_modules.html
+++ b/v0.1.0/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.0/using_doctr/using_datasets.html b/v0.1.0/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.0/using_doctr/using_datasets.html
+++ b/v0.1.0/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.0/using_doctr/using_model_export.html b/v0.1.0/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.0/using_doctr/using_model_export.html
+++ b/v0.1.0/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.0/using_doctr/using_models.html b/v0.1.0/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.0/using_doctr/using_models.html
+++ b/v0.1.0/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/cord.html b/v0.1.1/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.1/_modules/doctr/datasets/cord.html
+++ b/v0.1.1/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/detection.html b/v0.1.1/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.1/_modules/doctr/datasets/detection.html
+++ b/v0.1.1/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/funsd.html b/v0.1.1/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.1/_modules/doctr/datasets/funsd.html
+++ b/v0.1.1/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic03.html b/v0.1.1/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.1/_modules/doctr/datasets/ic03.html
+++ b/v0.1.1/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic13.html b/v0.1.1/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.1/_modules/doctr/datasets/ic13.html
+++ b/v0.1.1/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiit5k.html b/v0.1.1/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.1/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.1/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiithws.html b/v0.1.1/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.1/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.1/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/imgur5k.html b/v0.1.1/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.1/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.1/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/loader.html b/v0.1.1/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.1/_modules/doctr/datasets/loader.html
+++ b/v0.1.1/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/mjsynth.html b/v0.1.1/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.1/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.1/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ocr.html b/v0.1.1/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.1/_modules/doctr/datasets/ocr.html
+++ b/v0.1.1/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/recognition.html b/v0.1.1/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.1/_modules/doctr/datasets/recognition.html
+++ b/v0.1.1/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/sroie.html b/v0.1.1/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.1/_modules/doctr/datasets/sroie.html
+++ b/v0.1.1/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svhn.html b/v0.1.1/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.1/_modules/doctr/datasets/svhn.html
+++ b/v0.1.1/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svt.html b/v0.1.1/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.1/_modules/doctr/datasets/svt.html
+++ b/v0.1.1/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/synthtext.html b/v0.1.1/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.1/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.1/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/utils.html b/v0.1.1/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.1/_modules/doctr/datasets/utils.html
+++ b/v0.1.1/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/wildreceipt.html b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.1/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.1/_modules/doctr/io/elements.html b/v0.1.1/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.1/_modules/doctr/io/elements.html
+++ b/v0.1.1/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.1/_modules/doctr/io/html.html b/v0.1.1/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.1/_modules/doctr/io/html.html
+++ b/v0.1.1/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/base.html b/v0.1.1/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.1/_modules/doctr/io/image/base.html
+++ b/v0.1.1/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/tensorflow.html b/v0.1.1/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.1/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.1/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/io/pdf.html b/v0.1.1/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.1/_modules/doctr/io/pdf.html
+++ b/v0.1.1/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.1/_modules/doctr/io/reader.html b/v0.1.1/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.1/_modules/doctr/io/reader.html
+++ b/v0.1.1/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/zoo.html b/v0.1.1/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.1/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.1/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/zoo.html b/v0.1.1/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.1/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.1/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/factory/hub.html b/v0.1.1/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.1/_modules/doctr/models/factory/hub.html
+++ b/v0.1.1/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/zoo.html b/v0.1.1/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.1/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.1/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/zoo.html b/v0.1.1/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.1/_modules/doctr/models/zoo.html
+++ b/v0.1.1/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/base.html b/v0.1.1/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/utils/metrics.html b/v0.1.1/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.1/_modules/doctr/utils/metrics.html
+++ b/v0.1.1/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.1/_modules/doctr/utils/visualization.html b/v0.1.1/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.1/_modules/doctr/utils/visualization.html
+++ b/v0.1.1/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.1/_modules/index.html b/v0.1.1/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.1/_modules/index.html
+++ b/v0.1.1/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.1/_sources/getting_started/installing.rst.txt b/v0.1.1/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.1/_sources/getting_started/installing.rst.txt
+++ b/v0.1.1/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.1/_static/basic.css b/v0.1.1/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.1/_static/basic.css
+++ b/v0.1.1/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.1/_static/doctools.js b/v0.1.1/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.1/_static/doctools.js
+++ b/v0.1.1/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.1/_static/language_data.js b/v0.1.1/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.1/_static/language_data.js
+++ b/v0.1.1/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.1/_static/searchtools.js b/v0.1.1/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.1/_static/searchtools.js
+++ b/v0.1.1/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.1/changelog.html b/v0.1.1/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.1/changelog.html
+++ b/v0.1.1/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.1/community/resources.html b/v0.1.1/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.1/community/resources.html
+++ b/v0.1.1/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.1/contributing/code_of_conduct.html b/v0.1.1/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.1/contributing/code_of_conduct.html
+++ b/v0.1.1/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.1/contributing/contributing.html b/v0.1.1/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.1/contributing/contributing.html
+++ b/v0.1.1/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.1/genindex.html b/v0.1.1/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.1/genindex.html
+++ b/v0.1.1/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.1/getting_started/installing.html b/v0.1.1/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.1/getting_started/installing.html
+++ b/v0.1.1/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.1/index.html b/v0.1.1/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.1/index.html
+++ b/v0.1.1/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.1/modules/contrib.html b/v0.1.1/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.1/modules/contrib.html
+++ b/v0.1.1/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.1/modules/datasets.html b/v0.1.1/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.1/modules/datasets.html
+++ b/v0.1.1/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.1/modules/io.html b/v0.1.1/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.1/modules/io.html
+++ b/v0.1.1/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.1/modules/models.html b/v0.1.1/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.1/modules/models.html
+++ b/v0.1.1/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.1/modules/transforms.html b/v0.1.1/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.1/modules/transforms.html
+++ b/v0.1.1/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
W
- + diff --git a/v0.1.0/getting_started/installing.html b/v0.1.0/getting_started/installing.html index a488e9a030..af3b58193e 100644 --- a/v0.1.0/getting_started/installing.html +++ b/v0.1.0/getting_started/installing.html @@ -14,7 +14,7 @@ - +Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@Via Git¶
-
+
diff --git a/v0.1.0/index.html b/v0.1.0/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.0/index.html
+++ b/v0.1.0/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.0/modules/contrib.html b/v0.1.0/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.0/modules/contrib.html
+++ b/v0.1.0/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.0/modules/datasets.html b/v0.1.0/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.0/modules/datasets.html
+++ b/v0.1.0/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.0/modules/io.html b/v0.1.0/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.0/modules/io.html
+++ b/v0.1.0/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.0/modules/models.html b/v0.1.0/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.0/modules/models.html
+++ b/v0.1.0/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.0/modules/transforms.html b/v0.1.0/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.0/modules/transforms.html
+++ b/v0.1.0/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.0/modules/utils.html b/v0.1.0/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.0/modules/utils.html
+++ b/v0.1.0/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.0/notebooks.html b/v0.1.0/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.0/notebooks.html
+++ b/v0.1.0/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.0/search.html b/v0.1.0/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.0/search.html
+++ b/v0.1.0/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.0/searchindex.js b/v0.1.0/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.0/searchindex.js
+++ b/v0.1.0/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.0/using_doctr/custom_models_training.html b/v0.1.0/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.0/using_doctr/custom_models_training.html
+++ b/v0.1.0/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.0/using_doctr/running_on_aws.html b/v0.1.0/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.0/using_doctr/running_on_aws.html
+++ b/v0.1.0/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.0/using_doctr/sharing_models.html b/v0.1.0/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.0/using_doctr/sharing_models.html
+++ b/v0.1.0/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.0/using_doctr/using_contrib_modules.html b/v0.1.0/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.0/using_doctr/using_contrib_modules.html
+++ b/v0.1.0/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.0/using_doctr/using_datasets.html b/v0.1.0/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.0/using_doctr/using_datasets.html
+++ b/v0.1.0/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.0/using_doctr/using_model_export.html b/v0.1.0/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.0/using_doctr/using_model_export.html
+++ b/v0.1.0/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.0/using_doctr/using_models.html b/v0.1.0/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.0/using_doctr/using_models.html
+++ b/v0.1.0/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/cord.html b/v0.1.1/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.1/_modules/doctr/datasets/cord.html
+++ b/v0.1.1/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/detection.html b/v0.1.1/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.1/_modules/doctr/datasets/detection.html
+++ b/v0.1.1/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/funsd.html b/v0.1.1/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.1/_modules/doctr/datasets/funsd.html
+++ b/v0.1.1/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic03.html b/v0.1.1/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.1/_modules/doctr/datasets/ic03.html
+++ b/v0.1.1/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic13.html b/v0.1.1/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.1/_modules/doctr/datasets/ic13.html
+++ b/v0.1.1/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiit5k.html b/v0.1.1/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.1/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.1/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiithws.html b/v0.1.1/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.1/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.1/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/imgur5k.html b/v0.1.1/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.1/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.1/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/loader.html b/v0.1.1/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.1/_modules/doctr/datasets/loader.html
+++ b/v0.1.1/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/mjsynth.html b/v0.1.1/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.1/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.1/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ocr.html b/v0.1.1/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.1/_modules/doctr/datasets/ocr.html
+++ b/v0.1.1/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/recognition.html b/v0.1.1/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.1/_modules/doctr/datasets/recognition.html
+++ b/v0.1.1/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/sroie.html b/v0.1.1/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.1/_modules/doctr/datasets/sroie.html
+++ b/v0.1.1/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svhn.html b/v0.1.1/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.1/_modules/doctr/datasets/svhn.html
+++ b/v0.1.1/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svt.html b/v0.1.1/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.1/_modules/doctr/datasets/svt.html
+++ b/v0.1.1/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/synthtext.html b/v0.1.1/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.1/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.1/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/utils.html b/v0.1.1/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.1/_modules/doctr/datasets/utils.html
+++ b/v0.1.1/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/wildreceipt.html b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.1/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.1/_modules/doctr/io/elements.html b/v0.1.1/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.1/_modules/doctr/io/elements.html
+++ b/v0.1.1/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.1/_modules/doctr/io/html.html b/v0.1.1/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.1/_modules/doctr/io/html.html
+++ b/v0.1.1/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/base.html b/v0.1.1/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.1/_modules/doctr/io/image/base.html
+++ b/v0.1.1/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/tensorflow.html b/v0.1.1/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.1/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.1/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/io/pdf.html b/v0.1.1/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.1/_modules/doctr/io/pdf.html
+++ b/v0.1.1/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.1/_modules/doctr/io/reader.html b/v0.1.1/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.1/_modules/doctr/io/reader.html
+++ b/v0.1.1/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/zoo.html b/v0.1.1/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.1/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.1/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/zoo.html b/v0.1.1/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.1/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.1/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/factory/hub.html b/v0.1.1/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.1/_modules/doctr/models/factory/hub.html
+++ b/v0.1.1/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/zoo.html b/v0.1.1/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.1/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.1/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/zoo.html b/v0.1.1/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.1/_modules/doctr/models/zoo.html
+++ b/v0.1.1/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/base.html b/v0.1.1/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/utils/metrics.html b/v0.1.1/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.1/_modules/doctr/utils/metrics.html
+++ b/v0.1.1/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.1/_modules/doctr/utils/visualization.html b/v0.1.1/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.1/_modules/doctr/utils/visualization.html
+++ b/v0.1.1/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.1/_modules/index.html b/v0.1.1/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.1/_modules/index.html
+++ b/v0.1.1/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.1/_sources/getting_started/installing.rst.txt b/v0.1.1/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.1/_sources/getting_started/installing.rst.txt
+++ b/v0.1.1/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.1/_static/basic.css b/v0.1.1/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.1/_static/basic.css
+++ b/v0.1.1/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.1/_static/doctools.js b/v0.1.1/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.1/_static/doctools.js
+++ b/v0.1.1/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.1/_static/language_data.js b/v0.1.1/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.1/_static/language_data.js
+++ b/v0.1.1/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.1/_static/searchtools.js b/v0.1.1/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.1/_static/searchtools.js
+++ b/v0.1.1/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.1/changelog.html b/v0.1.1/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.1/changelog.html
+++ b/v0.1.1/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.1/community/resources.html b/v0.1.1/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.1/community/resources.html
+++ b/v0.1.1/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.1/contributing/code_of_conduct.html b/v0.1.1/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.1/contributing/code_of_conduct.html
+++ b/v0.1.1/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.1/contributing/contributing.html b/v0.1.1/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.1/contributing/contributing.html
+++ b/v0.1.1/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.1/genindex.html b/v0.1.1/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.1/genindex.html
+++ b/v0.1.1/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.1/getting_started/installing.html b/v0.1.1/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.1/getting_started/installing.html
+++ b/v0.1.1/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.1/index.html b/v0.1.1/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.1/index.html
+++ b/v0.1.1/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.1/modules/contrib.html b/v0.1.1/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.1/modules/contrib.html
+++ b/v0.1.1/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.1/modules/datasets.html b/v0.1.1/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.1/modules/datasets.html
+++ b/v0.1.1/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.1/modules/io.html b/v0.1.1/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.1/modules/io.html
+++ b/v0.1.1/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.1/modules/models.html b/v0.1.1/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.1/modules/models.html
+++ b/v0.1.1/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.1/modules/transforms.html b/v0.1.1/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.1/modules/transforms.html
+++ b/v0.1.1/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
Supported contribution modules
-
+
diff --git a/v0.1.0/modules/datasets.html b/v0.1.0/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.0/modules/datasets.html
+++ b/v0.1.0/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.0/modules/io.html b/v0.1.0/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.0/modules/io.html
+++ b/v0.1.0/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.0/modules/models.html b/v0.1.0/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.0/modules/models.html
+++ b/v0.1.0/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.0/modules/transforms.html b/v0.1.0/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.0/modules/transforms.html
+++ b/v0.1.0/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.0/modules/utils.html b/v0.1.0/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.0/modules/utils.html
+++ b/v0.1.0/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.0/notebooks.html b/v0.1.0/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.0/notebooks.html
+++ b/v0.1.0/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.0/search.html b/v0.1.0/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.0/search.html
+++ b/v0.1.0/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.0/searchindex.js b/v0.1.0/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.0/searchindex.js
+++ b/v0.1.0/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.0/using_doctr/custom_models_training.html b/v0.1.0/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.0/using_doctr/custom_models_training.html
+++ b/v0.1.0/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.0/using_doctr/running_on_aws.html b/v0.1.0/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.0/using_doctr/running_on_aws.html
+++ b/v0.1.0/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.0/using_doctr/sharing_models.html b/v0.1.0/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.0/using_doctr/sharing_models.html
+++ b/v0.1.0/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.0/using_doctr/using_contrib_modules.html b/v0.1.0/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.0/using_doctr/using_contrib_modules.html
+++ b/v0.1.0/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.0/using_doctr/using_datasets.html b/v0.1.0/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.0/using_doctr/using_datasets.html
+++ b/v0.1.0/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.0/using_doctr/using_model_export.html b/v0.1.0/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.0/using_doctr/using_model_export.html
+++ b/v0.1.0/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.0/using_doctr/using_models.html b/v0.1.0/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.0/using_doctr/using_models.html
+++ b/v0.1.0/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/cord.html b/v0.1.1/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.1/_modules/doctr/datasets/cord.html
+++ b/v0.1.1/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/detection.html b/v0.1.1/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.1/_modules/doctr/datasets/detection.html
+++ b/v0.1.1/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/funsd.html b/v0.1.1/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.1/_modules/doctr/datasets/funsd.html
+++ b/v0.1.1/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic03.html b/v0.1.1/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.1/_modules/doctr/datasets/ic03.html
+++ b/v0.1.1/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic13.html b/v0.1.1/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.1/_modules/doctr/datasets/ic13.html
+++ b/v0.1.1/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiit5k.html b/v0.1.1/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.1/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.1/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiithws.html b/v0.1.1/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.1/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.1/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/imgur5k.html b/v0.1.1/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.1/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.1/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/loader.html b/v0.1.1/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.1/_modules/doctr/datasets/loader.html
+++ b/v0.1.1/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/mjsynth.html b/v0.1.1/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.1/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.1/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ocr.html b/v0.1.1/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.1/_modules/doctr/datasets/ocr.html
+++ b/v0.1.1/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/recognition.html b/v0.1.1/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.1/_modules/doctr/datasets/recognition.html
+++ b/v0.1.1/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/sroie.html b/v0.1.1/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.1/_modules/doctr/datasets/sroie.html
+++ b/v0.1.1/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svhn.html b/v0.1.1/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.1/_modules/doctr/datasets/svhn.html
+++ b/v0.1.1/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svt.html b/v0.1.1/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.1/_modules/doctr/datasets/svt.html
+++ b/v0.1.1/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/synthtext.html b/v0.1.1/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.1/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.1/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/utils.html b/v0.1.1/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.1/_modules/doctr/datasets/utils.html
+++ b/v0.1.1/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/wildreceipt.html b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.1/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.1/_modules/doctr/io/elements.html b/v0.1.1/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.1/_modules/doctr/io/elements.html
+++ b/v0.1.1/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.1/_modules/doctr/io/html.html b/v0.1.1/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.1/_modules/doctr/io/html.html
+++ b/v0.1.1/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/base.html b/v0.1.1/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.1/_modules/doctr/io/image/base.html
+++ b/v0.1.1/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/tensorflow.html b/v0.1.1/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.1/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.1/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/io/pdf.html b/v0.1.1/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.1/_modules/doctr/io/pdf.html
+++ b/v0.1.1/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.1/_modules/doctr/io/reader.html b/v0.1.1/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.1/_modules/doctr/io/reader.html
+++ b/v0.1.1/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/zoo.html b/v0.1.1/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.1/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.1/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/zoo.html b/v0.1.1/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.1/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.1/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/factory/hub.html b/v0.1.1/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.1/_modules/doctr/models/factory/hub.html
+++ b/v0.1.1/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/zoo.html b/v0.1.1/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.1/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.1/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/zoo.html b/v0.1.1/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.1/_modules/doctr/models/zoo.html
+++ b/v0.1.1/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/base.html b/v0.1.1/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/utils/metrics.html b/v0.1.1/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.1/_modules/doctr/utils/metrics.html
+++ b/v0.1.1/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.1/_modules/doctr/utils/visualization.html b/v0.1.1/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.1/_modules/doctr/utils/visualization.html
+++ b/v0.1.1/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.1/_modules/index.html b/v0.1.1/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.1/_modules/index.html
+++ b/v0.1.1/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.1/_sources/getting_started/installing.rst.txt b/v0.1.1/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.1/_sources/getting_started/installing.rst.txt
+++ b/v0.1.1/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.1/_static/basic.css b/v0.1.1/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.1/_static/basic.css
+++ b/v0.1.1/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.1/_static/doctools.js b/v0.1.1/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.1/_static/doctools.js
+++ b/v0.1.1/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.1/_static/language_data.js b/v0.1.1/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.1/_static/language_data.js
+++ b/v0.1.1/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.1/_static/searchtools.js b/v0.1.1/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.1/_static/searchtools.js
+++ b/v0.1.1/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.1/changelog.html b/v0.1.1/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.1/changelog.html
+++ b/v0.1.1/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.1/community/resources.html b/v0.1.1/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.1/community/resources.html
+++ b/v0.1.1/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.1/contributing/code_of_conduct.html b/v0.1.1/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.1/contributing/code_of_conduct.html
+++ b/v0.1.1/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.1/contributing/contributing.html b/v0.1.1/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.1/contributing/contributing.html
+++ b/v0.1.1/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.1/genindex.html b/v0.1.1/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.1/genindex.html
+++ b/v0.1.1/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.1/getting_started/installing.html b/v0.1.1/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.1/getting_started/installing.html
+++ b/v0.1.1/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.1/index.html b/v0.1.1/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.1/index.html
+++ b/v0.1.1/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.1/modules/contrib.html b/v0.1.1/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.1/modules/contrib.html
+++ b/v0.1.1/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.1/modules/datasets.html b/v0.1.1/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.1/modules/datasets.html
+++ b/v0.1.1/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.1/modules/io.html b/v0.1.1/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.1/modules/io.html
+++ b/v0.1.1/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.1/modules/models.html b/v0.1.1/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.1/modules/models.html
+++ b/v0.1.1/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.1/modules/transforms.html b/v0.1.1/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.1/modules/transforms.html
+++ b/v0.1.1/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
Returns:¶
-
+
diff --git a/v0.1.0/modules/models.html b/v0.1.0/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.0/modules/models.html
+++ b/v0.1.0/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.0/modules/transforms.html b/v0.1.0/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.0/modules/transforms.html
+++ b/v0.1.0/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.0/modules/utils.html b/v0.1.0/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.0/modules/utils.html
+++ b/v0.1.0/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.0/notebooks.html b/v0.1.0/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.0/notebooks.html
+++ b/v0.1.0/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.0/search.html b/v0.1.0/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.0/search.html
+++ b/v0.1.0/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.0/searchindex.js b/v0.1.0/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.0/searchindex.js
+++ b/v0.1.0/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.0/using_doctr/custom_models_training.html b/v0.1.0/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.0/using_doctr/custom_models_training.html
+++ b/v0.1.0/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.0/using_doctr/running_on_aws.html b/v0.1.0/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.0/using_doctr/running_on_aws.html
+++ b/v0.1.0/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.0/using_doctr/sharing_models.html b/v0.1.0/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.0/using_doctr/sharing_models.html
+++ b/v0.1.0/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.0/using_doctr/using_contrib_modules.html b/v0.1.0/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.0/using_doctr/using_contrib_modules.html
+++ b/v0.1.0/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.0/using_doctr/using_datasets.html b/v0.1.0/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.0/using_doctr/using_datasets.html
+++ b/v0.1.0/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.0/using_doctr/using_model_export.html b/v0.1.0/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.0/using_doctr/using_model_export.html
+++ b/v0.1.0/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.0/using_doctr/using_models.html b/v0.1.0/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.0/using_doctr/using_models.html
+++ b/v0.1.0/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/cord.html b/v0.1.1/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.1/_modules/doctr/datasets/cord.html
+++ b/v0.1.1/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/detection.html b/v0.1.1/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.1/_modules/doctr/datasets/detection.html
+++ b/v0.1.1/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/funsd.html b/v0.1.1/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.1/_modules/doctr/datasets/funsd.html
+++ b/v0.1.1/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic03.html b/v0.1.1/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.1/_modules/doctr/datasets/ic03.html
+++ b/v0.1.1/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic13.html b/v0.1.1/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.1/_modules/doctr/datasets/ic13.html
+++ b/v0.1.1/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiit5k.html b/v0.1.1/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.1/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.1/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiithws.html b/v0.1.1/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.1/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.1/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/imgur5k.html b/v0.1.1/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.1/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.1/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/loader.html b/v0.1.1/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.1/_modules/doctr/datasets/loader.html
+++ b/v0.1.1/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/mjsynth.html b/v0.1.1/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.1/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.1/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ocr.html b/v0.1.1/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.1/_modules/doctr/datasets/ocr.html
+++ b/v0.1.1/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/recognition.html b/v0.1.1/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.1/_modules/doctr/datasets/recognition.html
+++ b/v0.1.1/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/sroie.html b/v0.1.1/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.1/_modules/doctr/datasets/sroie.html
+++ b/v0.1.1/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svhn.html b/v0.1.1/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.1/_modules/doctr/datasets/svhn.html
+++ b/v0.1.1/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svt.html b/v0.1.1/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.1/_modules/doctr/datasets/svt.html
+++ b/v0.1.1/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/synthtext.html b/v0.1.1/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.1/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.1/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/utils.html b/v0.1.1/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.1/_modules/doctr/datasets/utils.html
+++ b/v0.1.1/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/wildreceipt.html b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.1/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.1/_modules/doctr/io/elements.html b/v0.1.1/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.1/_modules/doctr/io/elements.html
+++ b/v0.1.1/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.1/_modules/doctr/io/html.html b/v0.1.1/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.1/_modules/doctr/io/html.html
+++ b/v0.1.1/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/base.html b/v0.1.1/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.1/_modules/doctr/io/image/base.html
+++ b/v0.1.1/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/tensorflow.html b/v0.1.1/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.1/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.1/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/io/pdf.html b/v0.1.1/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.1/_modules/doctr/io/pdf.html
+++ b/v0.1.1/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.1/_modules/doctr/io/reader.html b/v0.1.1/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.1/_modules/doctr/io/reader.html
+++ b/v0.1.1/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/zoo.html b/v0.1.1/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.1/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.1/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/zoo.html b/v0.1.1/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.1/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.1/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/factory/hub.html b/v0.1.1/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.1/_modules/doctr/models/factory/hub.html
+++ b/v0.1.1/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/zoo.html b/v0.1.1/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.1/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.1/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/zoo.html b/v0.1.1/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.1/_modules/doctr/models/zoo.html
+++ b/v0.1.1/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/base.html b/v0.1.1/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/utils/metrics.html b/v0.1.1/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.1/_modules/doctr/utils/metrics.html
+++ b/v0.1.1/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.1/_modules/doctr/utils/visualization.html b/v0.1.1/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.1/_modules/doctr/utils/visualization.html
+++ b/v0.1.1/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.1/_modules/index.html b/v0.1.1/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.1/_modules/index.html
+++ b/v0.1.1/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.1/_sources/getting_started/installing.rst.txt b/v0.1.1/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.1/_sources/getting_started/installing.rst.txt
+++ b/v0.1.1/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.1/_static/basic.css b/v0.1.1/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.1/_static/basic.css
+++ b/v0.1.1/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.1/_static/doctools.js b/v0.1.1/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.1/_static/doctools.js
+++ b/v0.1.1/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.1/_static/language_data.js b/v0.1.1/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.1/_static/language_data.js
+++ b/v0.1.1/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.1/_static/searchtools.js b/v0.1.1/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.1/_static/searchtools.js
+++ b/v0.1.1/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.1/changelog.html b/v0.1.1/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.1/changelog.html
+++ b/v0.1.1/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.1/community/resources.html b/v0.1.1/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.1/community/resources.html
+++ b/v0.1.1/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.1/contributing/code_of_conduct.html b/v0.1.1/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.1/contributing/code_of_conduct.html
+++ b/v0.1.1/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.1/contributing/contributing.html b/v0.1.1/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.1/contributing/contributing.html
+++ b/v0.1.1/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.1/genindex.html b/v0.1.1/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.1/genindex.html
+++ b/v0.1.1/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.1/getting_started/installing.html b/v0.1.1/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.1/getting_started/installing.html
+++ b/v0.1.1/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.1/index.html b/v0.1.1/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.1/index.html
+++ b/v0.1.1/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.1/modules/contrib.html b/v0.1.1/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.1/modules/contrib.html
+++ b/v0.1.1/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.1/modules/datasets.html b/v0.1.1/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.1/modules/datasets.html
+++ b/v0.1.1/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.1/modules/io.html b/v0.1.1/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.1/modules/io.html
+++ b/v0.1.1/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.1/modules/models.html b/v0.1.1/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.1/modules/models.html
+++ b/v0.1.1/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.1/modules/transforms.html b/v0.1.1/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.1/modules/transforms.html
+++ b/v0.1.1/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
Args:¶<
-
+
diff --git a/v0.1.0/modules/utils.html b/v0.1.0/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.0/modules/utils.html
+++ b/v0.1.0/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.0/notebooks.html b/v0.1.0/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.0/notebooks.html
+++ b/v0.1.0/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.0/search.html b/v0.1.0/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.0/search.html
+++ b/v0.1.0/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.0/searchindex.js b/v0.1.0/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.0/searchindex.js
+++ b/v0.1.0/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.0/using_doctr/custom_models_training.html b/v0.1.0/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.0/using_doctr/custom_models_training.html
+++ b/v0.1.0/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.0/using_doctr/running_on_aws.html b/v0.1.0/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.0/using_doctr/running_on_aws.html
+++ b/v0.1.0/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.0/using_doctr/sharing_models.html b/v0.1.0/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.0/using_doctr/sharing_models.html
+++ b/v0.1.0/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.0/using_doctr/using_contrib_modules.html b/v0.1.0/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.0/using_doctr/using_contrib_modules.html
+++ b/v0.1.0/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.0/using_doctr/using_datasets.html b/v0.1.0/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.0/using_doctr/using_datasets.html
+++ b/v0.1.0/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.0/using_doctr/using_model_export.html b/v0.1.0/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.0/using_doctr/using_model_export.html
+++ b/v0.1.0/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.0/using_doctr/using_models.html b/v0.1.0/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.0/using_doctr/using_models.html
+++ b/v0.1.0/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/cord.html b/v0.1.1/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.1/_modules/doctr/datasets/cord.html
+++ b/v0.1.1/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/detection.html b/v0.1.1/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.1/_modules/doctr/datasets/detection.html
+++ b/v0.1.1/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/funsd.html b/v0.1.1/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.1/_modules/doctr/datasets/funsd.html
+++ b/v0.1.1/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic03.html b/v0.1.1/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.1/_modules/doctr/datasets/ic03.html
+++ b/v0.1.1/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic13.html b/v0.1.1/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.1/_modules/doctr/datasets/ic13.html
+++ b/v0.1.1/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiit5k.html b/v0.1.1/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.1/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.1/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiithws.html b/v0.1.1/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.1/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.1/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/imgur5k.html b/v0.1.1/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.1/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.1/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/loader.html b/v0.1.1/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.1/_modules/doctr/datasets/loader.html
+++ b/v0.1.1/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/mjsynth.html b/v0.1.1/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.1/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.1/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ocr.html b/v0.1.1/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.1/_modules/doctr/datasets/ocr.html
+++ b/v0.1.1/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/recognition.html b/v0.1.1/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.1/_modules/doctr/datasets/recognition.html
+++ b/v0.1.1/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/sroie.html b/v0.1.1/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.1/_modules/doctr/datasets/sroie.html
+++ b/v0.1.1/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svhn.html b/v0.1.1/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.1/_modules/doctr/datasets/svhn.html
+++ b/v0.1.1/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svt.html b/v0.1.1/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.1/_modules/doctr/datasets/svt.html
+++ b/v0.1.1/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/synthtext.html b/v0.1.1/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.1/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.1/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/utils.html b/v0.1.1/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.1/_modules/doctr/datasets/utils.html
+++ b/v0.1.1/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/wildreceipt.html b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.1/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.1/_modules/doctr/io/elements.html b/v0.1.1/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.1/_modules/doctr/io/elements.html
+++ b/v0.1.1/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.1/_modules/doctr/io/html.html b/v0.1.1/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.1/_modules/doctr/io/html.html
+++ b/v0.1.1/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/base.html b/v0.1.1/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.1/_modules/doctr/io/image/base.html
+++ b/v0.1.1/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/tensorflow.html b/v0.1.1/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.1/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.1/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/io/pdf.html b/v0.1.1/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.1/_modules/doctr/io/pdf.html
+++ b/v0.1.1/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.1/_modules/doctr/io/reader.html b/v0.1.1/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.1/_modules/doctr/io/reader.html
+++ b/v0.1.1/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/zoo.html b/v0.1.1/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.1/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.1/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/zoo.html b/v0.1.1/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.1/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.1/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/factory/hub.html b/v0.1.1/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.1/_modules/doctr/models/factory/hub.html
+++ b/v0.1.1/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/zoo.html b/v0.1.1/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.1/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.1/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/zoo.html b/v0.1.1/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.1/_modules/doctr/models/zoo.html
+++ b/v0.1.1/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/base.html b/v0.1.1/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/utils/metrics.html b/v0.1.1/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.1/_modules/doctr/utils/metrics.html
+++ b/v0.1.1/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.1/_modules/doctr/utils/visualization.html b/v0.1.1/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.1/_modules/doctr/utils/visualization.html
+++ b/v0.1.1/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.1/_modules/index.html b/v0.1.1/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.1/_modules/index.html
+++ b/v0.1.1/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.1/_sources/getting_started/installing.rst.txt b/v0.1.1/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.1/_sources/getting_started/installing.rst.txt
+++ b/v0.1.1/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.1/_static/basic.css b/v0.1.1/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.1/_static/basic.css
+++ b/v0.1.1/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.1/_static/doctools.js b/v0.1.1/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.1/_static/doctools.js
+++ b/v0.1.1/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.1/_static/language_data.js b/v0.1.1/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.1/_static/language_data.js
+++ b/v0.1.1/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.1/_static/searchtools.js b/v0.1.1/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.1/_static/searchtools.js
+++ b/v0.1.1/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.1/changelog.html b/v0.1.1/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.1/changelog.html
+++ b/v0.1.1/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.1/community/resources.html b/v0.1.1/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.1/community/resources.html
+++ b/v0.1.1/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.1/contributing/code_of_conduct.html b/v0.1.1/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.1/contributing/code_of_conduct.html
+++ b/v0.1.1/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.1/contributing/contributing.html b/v0.1.1/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.1/contributing/contributing.html
+++ b/v0.1.1/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.1/genindex.html b/v0.1.1/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.1/genindex.html
+++ b/v0.1.1/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.1/getting_started/installing.html b/v0.1.1/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.1/getting_started/installing.html
+++ b/v0.1.1/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.1/index.html b/v0.1.1/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.1/index.html
+++ b/v0.1.1/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.1/modules/contrib.html b/v0.1.1/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.1/modules/contrib.html
+++ b/v0.1.1/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.1/modules/datasets.html b/v0.1.1/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.1/modules/datasets.html
+++ b/v0.1.1/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.1/modules/io.html b/v0.1.1/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.1/modules/io.html
+++ b/v0.1.1/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.1/modules/models.html b/v0.1.1/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.1/modules/models.html
+++ b/v0.1.1/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.1/modules/transforms.html b/v0.1.1/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.1/modules/transforms.html
+++ b/v0.1.1/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
docTR Notebooks
-
+
diff --git a/v0.1.0/search.html b/v0.1.0/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.0/search.html
+++ b/v0.1.0/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.0/searchindex.js b/v0.1.0/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.0/searchindex.js
+++ b/v0.1.0/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.0/using_doctr/custom_models_training.html b/v0.1.0/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.0/using_doctr/custom_models_training.html
+++ b/v0.1.0/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.0/using_doctr/running_on_aws.html b/v0.1.0/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.0/using_doctr/running_on_aws.html
+++ b/v0.1.0/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.0/using_doctr/sharing_models.html b/v0.1.0/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.0/using_doctr/sharing_models.html
+++ b/v0.1.0/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.0/using_doctr/using_contrib_modules.html b/v0.1.0/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.0/using_doctr/using_contrib_modules.html
+++ b/v0.1.0/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.0/using_doctr/using_datasets.html b/v0.1.0/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.0/using_doctr/using_datasets.html
+++ b/v0.1.0/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.0/using_doctr/using_model_export.html b/v0.1.0/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.0/using_doctr/using_model_export.html
+++ b/v0.1.0/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.0/using_doctr/using_models.html b/v0.1.0/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.0/using_doctr/using_models.html
+++ b/v0.1.0/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/cord.html b/v0.1.1/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.1/_modules/doctr/datasets/cord.html
+++ b/v0.1.1/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/detection.html b/v0.1.1/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.1/_modules/doctr/datasets/detection.html
+++ b/v0.1.1/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/funsd.html b/v0.1.1/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.1/_modules/doctr/datasets/funsd.html
+++ b/v0.1.1/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic03.html b/v0.1.1/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.1/_modules/doctr/datasets/ic03.html
+++ b/v0.1.1/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic13.html b/v0.1.1/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.1/_modules/doctr/datasets/ic13.html
+++ b/v0.1.1/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiit5k.html b/v0.1.1/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.1/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.1/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiithws.html b/v0.1.1/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.1/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.1/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/imgur5k.html b/v0.1.1/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.1/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.1/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/loader.html b/v0.1.1/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.1/_modules/doctr/datasets/loader.html
+++ b/v0.1.1/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/mjsynth.html b/v0.1.1/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.1/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.1/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ocr.html b/v0.1.1/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.1/_modules/doctr/datasets/ocr.html
+++ b/v0.1.1/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/recognition.html b/v0.1.1/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.1/_modules/doctr/datasets/recognition.html
+++ b/v0.1.1/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/sroie.html b/v0.1.1/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.1/_modules/doctr/datasets/sroie.html
+++ b/v0.1.1/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svhn.html b/v0.1.1/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.1/_modules/doctr/datasets/svhn.html
+++ b/v0.1.1/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svt.html b/v0.1.1/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.1/_modules/doctr/datasets/svt.html
+++ b/v0.1.1/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/synthtext.html b/v0.1.1/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.1/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.1/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/utils.html b/v0.1.1/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.1/_modules/doctr/datasets/utils.html
+++ b/v0.1.1/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/wildreceipt.html b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.1/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.1/_modules/doctr/io/elements.html b/v0.1.1/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.1/_modules/doctr/io/elements.html
+++ b/v0.1.1/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.1/_modules/doctr/io/html.html b/v0.1.1/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.1/_modules/doctr/io/html.html
+++ b/v0.1.1/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/base.html b/v0.1.1/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.1/_modules/doctr/io/image/base.html
+++ b/v0.1.1/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/tensorflow.html b/v0.1.1/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.1/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.1/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/io/pdf.html b/v0.1.1/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.1/_modules/doctr/io/pdf.html
+++ b/v0.1.1/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.1/_modules/doctr/io/reader.html b/v0.1.1/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.1/_modules/doctr/io/reader.html
+++ b/v0.1.1/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/zoo.html b/v0.1.1/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.1/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.1/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/zoo.html b/v0.1.1/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.1/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.1/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/factory/hub.html b/v0.1.1/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.1/_modules/doctr/models/factory/hub.html
+++ b/v0.1.1/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/zoo.html b/v0.1.1/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.1/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.1/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/zoo.html b/v0.1.1/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.1/_modules/doctr/models/zoo.html
+++ b/v0.1.1/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/base.html b/v0.1.1/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/utils/metrics.html b/v0.1.1/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.1/_modules/doctr/utils/metrics.html
+++ b/v0.1.1/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.1/_modules/doctr/utils/visualization.html b/v0.1.1/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.1/_modules/doctr/utils/visualization.html
+++ b/v0.1.1/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.1/_modules/index.html b/v0.1.1/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.1/_modules/index.html
+++ b/v0.1.1/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.1/_sources/getting_started/installing.rst.txt b/v0.1.1/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.1/_sources/getting_started/installing.rst.txt
+++ b/v0.1.1/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.1/_static/basic.css b/v0.1.1/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.1/_static/basic.css
+++ b/v0.1.1/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.1/_static/doctools.js b/v0.1.1/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.1/_static/doctools.js
+++ b/v0.1.1/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.1/_static/language_data.js b/v0.1.1/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.1/_static/language_data.js
+++ b/v0.1.1/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.1/_static/searchtools.js b/v0.1.1/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.1/_static/searchtools.js
+++ b/v0.1.1/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.1/changelog.html b/v0.1.1/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.1/changelog.html
+++ b/v0.1.1/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.1/community/resources.html b/v0.1.1/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.1/community/resources.html
+++ b/v0.1.1/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.1/contributing/code_of_conduct.html b/v0.1.1/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.1/contributing/code_of_conduct.html
+++ b/v0.1.1/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.1/contributing/contributing.html b/v0.1.1/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.1/contributing/contributing.html
+++ b/v0.1.1/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.1/genindex.html b/v0.1.1/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.1/genindex.html
+++ b/v0.1.1/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.1/getting_started/installing.html b/v0.1.1/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.1/getting_started/installing.html
+++ b/v0.1.1/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.1/index.html b/v0.1.1/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.1/index.html
+++ b/v0.1.1/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.1/modules/contrib.html b/v0.1.1/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.1/modules/contrib.html
+++ b/v0.1.1/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.1/modules/datasets.html b/v0.1.1/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.1/modules/datasets.html
+++ b/v0.1.1/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.1/modules/io.html b/v0.1.1/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.1/modules/io.html
+++ b/v0.1.1/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.1/modules/models.html b/v0.1.1/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.1/modules/models.html
+++ b/v0.1.1/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.1/modules/transforms.html b/v0.1.1/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.1/modules/transforms.html
+++ b/v0.1.1/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
AWS Lambda
-
+
diff --git a/v0.1.0/using_doctr/sharing_models.html b/v0.1.0/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.0/using_doctr/sharing_models.html
+++ b/v0.1.0/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.0/using_doctr/using_contrib_modules.html b/v0.1.0/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.0/using_doctr/using_contrib_modules.html
+++ b/v0.1.0/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.0/using_doctr/using_datasets.html b/v0.1.0/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.0/using_doctr/using_datasets.html
+++ b/v0.1.0/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.0/using_doctr/using_model_export.html b/v0.1.0/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.0/using_doctr/using_model_export.html
+++ b/v0.1.0/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.0/using_doctr/using_models.html b/v0.1.0/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.0/using_doctr/using_models.html
+++ b/v0.1.0/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/cord.html b/v0.1.1/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.1/_modules/doctr/datasets/cord.html
+++ b/v0.1.1/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/detection.html b/v0.1.1/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.1/_modules/doctr/datasets/detection.html
+++ b/v0.1.1/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/funsd.html b/v0.1.1/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.1/_modules/doctr/datasets/funsd.html
+++ b/v0.1.1/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic03.html b/v0.1.1/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.1/_modules/doctr/datasets/ic03.html
+++ b/v0.1.1/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic13.html b/v0.1.1/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.1/_modules/doctr/datasets/ic13.html
+++ b/v0.1.1/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiit5k.html b/v0.1.1/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.1/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.1/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiithws.html b/v0.1.1/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.1/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.1/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/imgur5k.html b/v0.1.1/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.1/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.1/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/loader.html b/v0.1.1/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.1/_modules/doctr/datasets/loader.html
+++ b/v0.1.1/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/mjsynth.html b/v0.1.1/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.1/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.1/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ocr.html b/v0.1.1/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.1/_modules/doctr/datasets/ocr.html
+++ b/v0.1.1/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/recognition.html b/v0.1.1/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.1/_modules/doctr/datasets/recognition.html
+++ b/v0.1.1/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/sroie.html b/v0.1.1/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.1/_modules/doctr/datasets/sroie.html
+++ b/v0.1.1/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svhn.html b/v0.1.1/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.1/_modules/doctr/datasets/svhn.html
+++ b/v0.1.1/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svt.html b/v0.1.1/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.1/_modules/doctr/datasets/svt.html
+++ b/v0.1.1/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/synthtext.html b/v0.1.1/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.1/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.1/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/utils.html b/v0.1.1/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.1/_modules/doctr/datasets/utils.html
+++ b/v0.1.1/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/wildreceipt.html b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.1/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.1/_modules/doctr/io/elements.html b/v0.1.1/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.1/_modules/doctr/io/elements.html
+++ b/v0.1.1/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.1/_modules/doctr/io/html.html b/v0.1.1/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.1/_modules/doctr/io/html.html
+++ b/v0.1.1/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/base.html b/v0.1.1/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.1/_modules/doctr/io/image/base.html
+++ b/v0.1.1/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/tensorflow.html b/v0.1.1/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.1/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.1/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/io/pdf.html b/v0.1.1/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.1/_modules/doctr/io/pdf.html
+++ b/v0.1.1/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.1/_modules/doctr/io/reader.html b/v0.1.1/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.1/_modules/doctr/io/reader.html
+++ b/v0.1.1/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/zoo.html b/v0.1.1/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.1/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.1/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/zoo.html b/v0.1.1/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.1/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.1/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/factory/hub.html b/v0.1.1/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.1/_modules/doctr/models/factory/hub.html
+++ b/v0.1.1/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/zoo.html b/v0.1.1/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.1/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.1/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/zoo.html b/v0.1.1/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.1/_modules/doctr/models/zoo.html
+++ b/v0.1.1/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/base.html b/v0.1.1/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/utils/metrics.html b/v0.1.1/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.1/_modules/doctr/utils/metrics.html
+++ b/v0.1.1/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.1/_modules/doctr/utils/visualization.html b/v0.1.1/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.1/_modules/doctr/utils/visualization.html
+++ b/v0.1.1/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.1/_modules/index.html b/v0.1.1/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.1/_modules/index.html
+++ b/v0.1.1/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.1/_sources/getting_started/installing.rst.txt b/v0.1.1/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.1/_sources/getting_started/installing.rst.txt
+++ b/v0.1.1/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.1/_static/basic.css b/v0.1.1/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.1/_static/basic.css
+++ b/v0.1.1/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.1/_static/doctools.js b/v0.1.1/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.1/_static/doctools.js
+++ b/v0.1.1/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.1/_static/language_data.js b/v0.1.1/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.1/_static/language_data.js
+++ b/v0.1.1/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.1/_static/searchtools.js b/v0.1.1/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.1/_static/searchtools.js
+++ b/v0.1.1/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.1/changelog.html b/v0.1.1/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.1/changelog.html
+++ b/v0.1.1/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.1/community/resources.html b/v0.1.1/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.1/community/resources.html
+++ b/v0.1.1/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.1/contributing/code_of_conduct.html b/v0.1.1/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.1/contributing/code_of_conduct.html
+++ b/v0.1.1/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.1/contributing/contributing.html b/v0.1.1/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.1/contributing/contributing.html
+++ b/v0.1.1/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.1/genindex.html b/v0.1.1/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.1/genindex.html
+++ b/v0.1.1/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.1/getting_started/installing.html b/v0.1.1/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.1/getting_started/installing.html
+++ b/v0.1.1/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.1/index.html b/v0.1.1/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.1/index.html
+++ b/v0.1.1/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.1/modules/contrib.html b/v0.1.1/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.1/modules/contrib.html
+++ b/v0.1.1/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.1/modules/datasets.html b/v0.1.1/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.1/modules/datasets.html
+++ b/v0.1.1/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.1/modules/io.html b/v0.1.1/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.1/modules/io.html
+++ b/v0.1.1/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.1/modules/models.html b/v0.1.1/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.1/modules/models.html
+++ b/v0.1.1/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.1/modules/transforms.html b/v0.1.1/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.1/modules/transforms.html
+++ b/v0.1.1/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
ArtefactDetection
-
+
diff --git a/v0.1.0/using_doctr/using_datasets.html b/v0.1.0/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.0/using_doctr/using_datasets.html
+++ b/v0.1.0/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.0/using_doctr/using_model_export.html b/v0.1.0/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.0/using_doctr/using_model_export.html
+++ b/v0.1.0/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.0/using_doctr/using_models.html b/v0.1.0/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.0/using_doctr/using_models.html
+++ b/v0.1.0/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/cord.html b/v0.1.1/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.1/_modules/doctr/datasets/cord.html
+++ b/v0.1.1/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/detection.html b/v0.1.1/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.1/_modules/doctr/datasets/detection.html
+++ b/v0.1.1/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/funsd.html b/v0.1.1/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.1/_modules/doctr/datasets/funsd.html
+++ b/v0.1.1/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic03.html b/v0.1.1/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.1/_modules/doctr/datasets/ic03.html
+++ b/v0.1.1/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic13.html b/v0.1.1/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.1/_modules/doctr/datasets/ic13.html
+++ b/v0.1.1/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiit5k.html b/v0.1.1/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.1/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.1/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiithws.html b/v0.1.1/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.1/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.1/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/imgur5k.html b/v0.1.1/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.1/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.1/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/loader.html b/v0.1.1/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.1/_modules/doctr/datasets/loader.html
+++ b/v0.1.1/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/mjsynth.html b/v0.1.1/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.1/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.1/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ocr.html b/v0.1.1/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.1/_modules/doctr/datasets/ocr.html
+++ b/v0.1.1/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/recognition.html b/v0.1.1/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.1/_modules/doctr/datasets/recognition.html
+++ b/v0.1.1/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/sroie.html b/v0.1.1/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.1/_modules/doctr/datasets/sroie.html
+++ b/v0.1.1/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svhn.html b/v0.1.1/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.1/_modules/doctr/datasets/svhn.html
+++ b/v0.1.1/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svt.html b/v0.1.1/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.1/_modules/doctr/datasets/svt.html
+++ b/v0.1.1/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/synthtext.html b/v0.1.1/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.1/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.1/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/utils.html b/v0.1.1/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.1/_modules/doctr/datasets/utils.html
+++ b/v0.1.1/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/wildreceipt.html b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.1/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.1/_modules/doctr/io/elements.html b/v0.1.1/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.1/_modules/doctr/io/elements.html
+++ b/v0.1.1/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.1/_modules/doctr/io/html.html b/v0.1.1/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.1/_modules/doctr/io/html.html
+++ b/v0.1.1/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/base.html b/v0.1.1/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.1/_modules/doctr/io/image/base.html
+++ b/v0.1.1/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/tensorflow.html b/v0.1.1/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.1/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.1/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/io/pdf.html b/v0.1.1/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.1/_modules/doctr/io/pdf.html
+++ b/v0.1.1/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.1/_modules/doctr/io/reader.html b/v0.1.1/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.1/_modules/doctr/io/reader.html
+++ b/v0.1.1/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/zoo.html b/v0.1.1/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.1/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.1/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/zoo.html b/v0.1.1/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.1/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.1/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/factory/hub.html b/v0.1.1/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.1/_modules/doctr/models/factory/hub.html
+++ b/v0.1.1/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/zoo.html b/v0.1.1/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.1/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.1/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/zoo.html b/v0.1.1/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.1/_modules/doctr/models/zoo.html
+++ b/v0.1.1/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/base.html b/v0.1.1/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/utils/metrics.html b/v0.1.1/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.1/_modules/doctr/utils/metrics.html
+++ b/v0.1.1/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.1/_modules/doctr/utils/visualization.html b/v0.1.1/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.1/_modules/doctr/utils/visualization.html
+++ b/v0.1.1/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.1/_modules/index.html b/v0.1.1/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.1/_modules/index.html
+++ b/v0.1.1/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.1/_sources/getting_started/installing.rst.txt b/v0.1.1/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.1/_sources/getting_started/installing.rst.txt
+++ b/v0.1.1/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.1/_static/basic.css b/v0.1.1/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.1/_static/basic.css
+++ b/v0.1.1/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.1/_static/doctools.js b/v0.1.1/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.1/_static/doctools.js
+++ b/v0.1.1/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.1/_static/language_data.js b/v0.1.1/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.1/_static/language_data.js
+++ b/v0.1.1/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.1/_static/searchtools.js b/v0.1.1/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.1/_static/searchtools.js
+++ b/v0.1.1/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.1/changelog.html b/v0.1.1/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.1/changelog.html
+++ b/v0.1.1/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.1/community/resources.html b/v0.1.1/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.1/community/resources.html
+++ b/v0.1.1/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.1/contributing/code_of_conduct.html b/v0.1.1/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.1/contributing/code_of_conduct.html
+++ b/v0.1.1/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.1/contributing/contributing.html b/v0.1.1/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.1/contributing/contributing.html
+++ b/v0.1.1/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.1/genindex.html b/v0.1.1/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.1/genindex.html
+++ b/v0.1.1/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.1/getting_started/installing.html b/v0.1.1/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.1/getting_started/installing.html
+++ b/v0.1.1/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.1/index.html b/v0.1.1/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.1/index.html
+++ b/v0.1.1/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.1/modules/contrib.html b/v0.1.1/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.1/modules/contrib.html
+++ b/v0.1.1/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.1/modules/datasets.html b/v0.1.1/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.1/modules/datasets.html
+++ b/v0.1.1/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.1/modules/io.html b/v0.1.1/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.1/modules/io.html
+++ b/v0.1.1/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.1/modules/models.html b/v0.1.1/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.1/modules/models.html
+++ b/v0.1.1/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.1/modules/transforms.html b/v0.1.1/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.1/modules/transforms.html
+++ b/v0.1.1/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
Using your ONNX exported model
-
+
diff --git a/v0.1.0/using_doctr/using_models.html b/v0.1.0/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.0/using_doctr/using_models.html
+++ b/v0.1.0/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/cord.html b/v0.1.1/_modules/doctr/datasets/cord.html
index 78e70014e3..55b0584830 100644
--- a/v0.1.1/_modules/doctr/datasets/cord.html
+++ b/v0.1.1/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -462,7 +462,7 @@ Source code for doctr.datasets.cord
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/detection.html b/v0.1.1/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.1.1/_modules/doctr/datasets/detection.html
+++ b/v0.1.1/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.1.1/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/funsd.html b/v0.1.1/_modules/doctr/datasets/funsd.html
index e52abc5428..f08612f9fa 100644
--- a/v0.1.1/_modules/doctr/datasets/funsd.html
+++ b/v0.1.1/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.funsd
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.1.1/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic03.html b/v0.1.1/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.1.1/_modules/doctr/datasets/ic03.html
+++ b/v0.1.1/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ic13.html b/v0.1.1/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.1.1/_modules/doctr/datasets/ic13.html
+++ b/v0.1.1/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiit5k.html b/v0.1.1/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.1.1/_modules/doctr/datasets/iiit5k.html
+++ b/v0.1.1/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/iiithws.html b/v0.1.1/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.1.1/_modules/doctr/datasets/iiithws.html
+++ b/v0.1.1/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/imgur5k.html b/v0.1.1/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.1.1/_modules/doctr/datasets/imgur5k.html
+++ b/v0.1.1/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/loader.html b/v0.1.1/_modules/doctr/datasets/loader.html
index d1785caa1c..ed80350ef0 100644
--- a/v0.1.1/_modules/doctr/datasets/loader.html
+++ b/v0.1.1/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -429,7 +429,7 @@ Source code for doctr.datasets.loader
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/mjsynth.html b/v0.1.1/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.1.1/_modules/doctr/datasets/mjsynth.html
+++ b/v0.1.1/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/ocr.html b/v0.1.1/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.1.1/_modules/doctr/datasets/ocr.html
+++ b/v0.1.1/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/recognition.html b/v0.1.1/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.1.1/_modules/doctr/datasets/recognition.html
+++ b/v0.1.1/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/sroie.html b/v0.1.1/_modules/doctr/datasets/sroie.html
index 94c963390e..04cf10bda2 100644
--- a/v0.1.1/_modules/doctr/datasets/sroie.html
+++ b/v0.1.1/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.sroie
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svhn.html b/v0.1.1/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.1.1/_modules/doctr/datasets/svhn.html
+++ b/v0.1.1/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/svt.html b/v0.1.1/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.1.1/_modules/doctr/datasets/svt.html
+++ b/v0.1.1/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/synthtext.html b/v0.1.1/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.1.1/_modules/doctr/datasets/synthtext.html
+++ b/v0.1.1/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/utils.html b/v0.1.1/_modules/doctr/datasets/utils.html
index 9defb17ba5..bde9304597 100644
--- a/v0.1.1/_modules/doctr/datasets/utils.html
+++ b/v0.1.1/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -554,7 +554,7 @@ Source code for doctr.datasets.utils
-
+
diff --git a/v0.1.1/_modules/doctr/datasets/wildreceipt.html b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.1.1/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.1.1/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.1.1/_modules/doctr/io/elements.html b/v0.1.1/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.1.1/_modules/doctr/io/elements.html
+++ b/v0.1.1/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.1.1/_modules/doctr/io/html.html b/v0.1.1/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.1.1/_modules/doctr/io/html.html
+++ b/v0.1.1/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/base.html b/v0.1.1/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.1.1/_modules/doctr/io/image/base.html
+++ b/v0.1.1/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.1.1/_modules/doctr/io/image/tensorflow.html b/v0.1.1/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.1.1/_modules/doctr/io/image/tensorflow.html
+++ b/v0.1.1/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/io/pdf.html b/v0.1.1/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.1.1/_modules/doctr/io/pdf.html
+++ b/v0.1.1/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.1.1/_modules/doctr/io/reader.html b/v0.1.1/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.1.1/_modules/doctr/io/reader.html
+++ b/v0.1.1/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/classification/zoo.html b/v0.1.1/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.1.1/_modules/doctr/models/classification/zoo.html
+++ b/v0.1.1/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/detection/zoo.html b/v0.1.1/_modules/doctr/models/detection/zoo.html
index 312f4584ab..3651c4e2d3 100644
--- a/v0.1.1/_modules/doctr/models/detection/zoo.html
+++ b/v0.1.1/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -450,7 +450,7 @@ Source code for doctr.models.detection.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/factory/hub.html b/v0.1.1/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.1.1/_modules/doctr/models/factory/hub.html
+++ b/v0.1.1/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.1.1/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/models/recognition/zoo.html b/v0.1.1/_modules/doctr/models/recognition/zoo.html
index 2c47f88de4..f664304019 100644
--- a/v0.1.1/_modules/doctr/models/recognition/zoo.html
+++ b/v0.1.1/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -415,7 +415,7 @@ Source code for doctr.models.recognition.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/models/zoo.html b/v0.1.1/_modules/doctr/models/zoo.html
index 5b22f2c79f..d459671648 100644
--- a/v0.1.1/_modules/doctr/models/zoo.html
+++ b/v0.1.1/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -576,7 +576,7 @@ Source code for doctr.models.zoo
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/base.html b/v0.1.1/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/base.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.1.1/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.1.1/_modules/doctr/utils/metrics.html b/v0.1.1/_modules/doctr/utils/metrics.html
index d35d7e9672..8a37d5949a 100644
--- a/v0.1.1/_modules/doctr/utils/metrics.html
+++ b/v0.1.1/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -936,7 +936,7 @@ Source code for doctr.utils.metrics
-
+
diff --git a/v0.1.1/_modules/doctr/utils/visualization.html b/v0.1.1/_modules/doctr/utils/visualization.html
index e608d492a4..c818be6d7b 100644
--- a/v0.1.1/_modules/doctr/utils/visualization.html
+++ b/v0.1.1/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -720,7 +720,7 @@ Source code for doctr.utils.visualization
-
+
diff --git a/v0.1.1/_modules/index.html b/v0.1.1/_modules/index.html
index 758ef41bd0..5793c44f20 100644
--- a/v0.1.1/_modules/index.html
+++ b/v0.1.1/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -378,7 +378,7 @@ All modules for which code is available
-
+
diff --git a/v0.1.1/_sources/getting_started/installing.rst.txt b/v0.1.1/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.1.1/_sources/getting_started/installing.rst.txt
+++ b/v0.1.1/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.1.1/_static/basic.css b/v0.1.1/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.1.1/_static/basic.css
+++ b/v0.1.1/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.1.1/_static/doctools.js b/v0.1.1/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.1.1/_static/doctools.js
+++ b/v0.1.1/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.1.1/_static/language_data.js b/v0.1.1/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.1.1/_static/language_data.js
+++ b/v0.1.1/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.1.1/_static/searchtools.js b/v0.1.1/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.1.1/_static/searchtools.js
+++ b/v0.1.1/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.1.1/changelog.html b/v0.1.1/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.1.1/changelog.html
+++ b/v0.1.1/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.1/community/resources.html b/v0.1.1/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.1/community/resources.html
+++ b/v0.1.1/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.1/contributing/code_of_conduct.html b/v0.1.1/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.1/contributing/code_of_conduct.html
+++ b/v0.1.1/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.1/contributing/contributing.html b/v0.1.1/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.1/contributing/contributing.html
+++ b/v0.1.1/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.1/genindex.html b/v0.1.1/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.1/genindex.html
+++ b/v0.1.1/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.1/getting_started/installing.html b/v0.1.1/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.1/getting_started/installing.html
+++ b/v0.1.1/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.1/index.html b/v0.1.1/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.1/index.html
+++ b/v0.1.1/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.1/modules/contrib.html b/v0.1.1/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.1/modules/contrib.html
+++ b/v0.1.1/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.1/modules/datasets.html b/v0.1.1/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.1/modules/datasets.html
+++ b/v0.1.1/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.1/modules/io.html b/v0.1.1/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.1/modules/io.html
+++ b/v0.1.1/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.1/modules/models.html b/v0.1.1/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.1/modules/models.html
+++ b/v0.1.1/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.1/modules/transforms.html b/v0.1.1/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.1/modules/transforms.html
+++ b/v0.1.1/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
Source code for doctr.datasets.cord
Source code for doctr.datasets.detection
Source code for doctr.datasets.doc_artefacts
Source code for doctr.datasets.funsd
Source code for doctr.datasets.generator.tensorflow
Source code for doctr.datasets.ic03
Source code for doctr.datasets.ic13
Source code for doctr.datasets.iiit5k
Source code for doctr.datasets.iiithws
Source code for doctr.datasets.imgur5k
Source code for doctr.datasets.loader
Source code for doctr.datasets.mjsynth
Source code for doctr.datasets.ocr
Source code for doctr.datasets.recognition
Source code for doctr.datasets.sroie
Source code for doctr.datasets.svhn
Source code for doctr.datasets.svt
Source code for doctr.datasets.synthtext
Source code for doctr.datasets.utils
Source code for doctr.datasets.wildreceipt
Source code for doctr.io.elements
Source code for doctr.io.html
Source code for doctr.io.image.base
Source code for doctr.io.image.tensorflow
Source code for doctr.io.pdf
Source code for doctr.io.reader
Source code for doctr.models.classification.magc_resnet.tensorflow
Source code for doctr.models.classification.mobilenet.tensorflow
Source code for doctr.models.classification.resnet.tensorflow
Source code for doctr.models.classification.textnet.tensorflow
Source code for doctr.models.classification.vgg.tensorflow
Source code for doctr.models.classification.vit.tensorflow
Source code for doctr.models.classification.zoo
Source code for doctr.models.detection.differentiable_binarization.tensorflo
Source code for doctr.models.detection.fast.tensorflow
Source code for doctr.models.detection.linknet.tensorflow
Source code for doctr.models.detection.zoo
Source code for doctr.models.factory.hub
Source code for doctr.models.recognition.crnn.tensorflow
Source code for doctr.models.recognition.master.tensorflow
Source code for doctr.models.recognition.parseq.tensorflow
Source code for doctr.models.recognition.sar.tensorflow
Source code for doctr.models.recognition.vitstr.tensorflow
Source code for doctr.models.recognition.zoo
Source code for doctr.models.zoo
Source code for doctr.transforms.modules.base
Source code for doctr.transforms.modules.tensorflow
Source code for doctr.utils.metrics
Source code for doctr.utils.visualization
All modules for which code is available
v0.1.0 (2021-03-05)
-
+
diff --git a/v0.1.1/community/resources.html b/v0.1.1/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.1.1/community/resources.html
+++ b/v0.1.1/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.1.1/contributing/code_of_conduct.html b/v0.1.1/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.1.1/contributing/code_of_conduct.html
+++ b/v0.1.1/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.1.1/contributing/contributing.html b/v0.1.1/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.1/contributing/contributing.html
+++ b/v0.1.1/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.1/genindex.html b/v0.1.1/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.1/genindex.html
+++ b/v0.1.1/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.1/getting_started/installing.html b/v0.1.1/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.1/getting_started/installing.html
+++ b/v0.1.1/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.1/index.html b/v0.1.1/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.1/index.html
+++ b/v0.1.1/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.1/modules/contrib.html b/v0.1.1/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.1/modules/contrib.html
+++ b/v0.1.1/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.1/modules/datasets.html b/v0.1.1/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.1/modules/datasets.html
+++ b/v0.1.1/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.1/modules/io.html b/v0.1.1/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.1/modules/io.html
+++ b/v0.1.1/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.1/modules/models.html b/v0.1.1/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.1/modules/models.html
+++ b/v0.1.1/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.1/modules/transforms.html b/v0.1.1/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.1/modules/transforms.html
+++ b/v0.1.1/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
Attribution
-
+
diff --git a/v0.1.1/contributing/contributing.html b/v0.1.1/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.1.1/contributing/contributing.html
+++ b/v0.1.1/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.1.1/genindex.html b/v0.1.1/genindex.html
index cbb43f08d8..21520455b4 100644
--- a/v0.1.1/genindex.html
+++ b/v0.1.1/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -756,7 +756,7 @@ W
-
+
diff --git a/v0.1.1/getting_started/installing.html b/v0.1.1/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.1.1/getting_started/installing.html
+++ b/v0.1.1/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.1.1/index.html b/v0.1.1/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.1/index.html
+++ b/v0.1.1/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.1/modules/contrib.html b/v0.1.1/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.1/modules/contrib.html
+++ b/v0.1.1/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.1/modules/datasets.html b/v0.1.1/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.1/modules/datasets.html
+++ b/v0.1.1/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.1/modules/io.html b/v0.1.1/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.1/modules/io.html
+++ b/v0.1.1/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.1/modules/models.html b/v0.1.1/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.1/modules/models.html
+++ b/v0.1.1/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.1/modules/transforms.html b/v0.1.1/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.1/modules/transforms.html
+++ b/v0.1.1/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
W
- + diff --git a/v0.1.1/getting_started/installing.html b/v0.1.1/getting_started/installing.html index a488e9a030..af3b58193e 100644 --- a/v0.1.1/getting_started/installing.html +++ b/v0.1.1/getting_started/installing.html @@ -14,7 +14,7 @@ - +Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@Via Git¶
-
+
diff --git a/v0.1.1/index.html b/v0.1.1/index.html
index 76509686f5..3a06afc6d9 100644
--- a/v0.1.1/index.html
+++ b/v0.1.1/index.html
@@ -14,7 +14,7 @@
-
+
docTR documentation
@@ -445,7 +445,7 @@ Supported datasets
-
+
diff --git a/v0.1.1/modules/contrib.html b/v0.1.1/modules/contrib.html
index e99f6b3f74..7fb86b8b38 100644
--- a/v0.1.1/modules/contrib.html
+++ b/v0.1.1/modules/contrib.html
@@ -14,7 +14,7 @@
-
+
doctr.contrib - docTR documentation
@@ -380,7 +380,7 @@ Supported contribution modules
-
+
diff --git a/v0.1.1/modules/datasets.html b/v0.1.1/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.1/modules/datasets.html
+++ b/v0.1.1/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.1/modules/io.html b/v0.1.1/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.1/modules/io.html
+++ b/v0.1.1/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.1/modules/models.html b/v0.1.1/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.1/modules/models.html
+++ b/v0.1.1/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.1/modules/transforms.html b/v0.1.1/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.1/modules/transforms.html
+++ b/v0.1.1/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
Supported contribution modules
-
+
diff --git a/v0.1.1/modules/datasets.html b/v0.1.1/modules/datasets.html
index 456e10b172..380a986793 100644
--- a/v0.1.1/modules/datasets.html
+++ b/v0.1.1/modules/datasets.html
@@ -14,7 +14,7 @@
-
+
doctr.datasets - docTR documentation
@@ -1081,7 +1081,7 @@ Returns:
-
+
diff --git a/v0.1.1/modules/io.html b/v0.1.1/modules/io.html
index 01eadaa4b8..24c41954be 100644
--- a/v0.1.1/modules/io.html
+++ b/v0.1.1/modules/io.html
@@ -14,7 +14,7 @@
-
+
doctr.io - docTR documentation
@@ -760,7 +760,7 @@ Returns:¶
-
+
diff --git a/v0.1.1/modules/models.html b/v0.1.1/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.1/modules/models.html
+++ b/v0.1.1/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.1/modules/transforms.html b/v0.1.1/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.1/modules/transforms.html
+++ b/v0.1.1/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
Returns:¶
-
+
diff --git a/v0.1.1/modules/models.html b/v0.1.1/modules/models.html
index c465cc0586..91b8810a6a 100644
--- a/v0.1.1/modules/models.html
+++ b/v0.1.1/modules/models.html
@@ -14,7 +14,7 @@
-
+
doctr.models - docTR documentation
@@ -1612,7 +1612,7 @@ Args:¶
-
+
diff --git a/v0.1.1/modules/transforms.html b/v0.1.1/modules/transforms.html
index 30f7a2631a..c5ead3f3ce 100644
--- a/v0.1.1/modules/transforms.html
+++ b/v0.1.1/modules/transforms.html
@@ -14,7 +14,7 @@
-
+
doctr.transforms - docTR documentation
@@ -835,7 +835,7 @@ Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
Args:¶<
-
+
diff --git a/v0.1.1/modules/utils.html b/v0.1.1/modules/utils.html
index 888a32c321..b7f6fc570b 100644
--- a/v0.1.1/modules/utils.html
+++ b/v0.1.1/modules/utils.html
@@ -14,7 +14,7 @@
-
+
doctr.utils - docTR documentation
@@ -715,7 +715,7 @@ Args:¶
-
+
diff --git a/v0.1.1/notebooks.html b/v0.1.1/notebooks.html
index f97771aebb..d36539f59e 100644
--- a/v0.1.1/notebooks.html
+++ b/v0.1.1/notebooks.html
@@ -14,7 +14,7 @@
-
+
docTR Notebooks - docTR documentation
@@ -391,7 +391,7 @@ docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
docTR Notebooks
-
+
diff --git a/v0.1.1/search.html b/v0.1.1/search.html
index 82b8bd6950..d050f5eac7 100644
--- a/v0.1.1/search.html
+++ b/v0.1.1/search.html
@@ -14,7 +14,7 @@
-
+
Search - docTR documentation
@@ -340,7 +340,7 @@
-
+
diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
index bfa546d0e9..6f154115ab 100644
--- a/v0.1.1/searchindex.js
+++ b/v0.1.1/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [4, 10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Correction": [[2, "correction"]], "2. Warning": [[2, "warning"]], "3. Temporary Ban": [[2, "temporary-ban"]], "4. Permanent Ban": [[2, "permanent-ban"]], "AWS Lambda": [[14, null]], "Advanced options": [[19, "advanced-options"]], "Args:": [[7, "args"], [7, "id4"], [7, "id7"], [7, "id10"], [7, "id13"], [7, "id16"], [7, "id19"], [7, "id22"], [7, "id25"], [7, "id29"], [7, "id32"], [7, "id37"], [7, "id40"], [7, "id46"], [7, "id49"], [7, "id50"], [7, "id51"], [7, "id54"], [7, "id57"], [7, "id60"], [7, "id61"], [8, "args"], [8, "id2"], [8, "id3"], [8, "id4"], [8, "id5"], [8, "id6"], [8, "id7"], [8, "id10"], [8, "id12"], [8, "id14"], [8, "id16"], [8, "id20"], [8, "id24"], [8, "id28"], [9, "args"], [9, "id3"], [9, "id8"], [9, "id13"], [9, "id17"], [9, "id21"], [9, "id26"], [9, "id31"], [9, "id36"], [9, "id41"], [9, "id46"], [9, "id50"], [9, "id54"], [9, "id59"], [9, "id63"], [9, "id68"], [9, "id73"], [9, "id77"], [9, "id81"], [9, "id85"], [9, "id90"], [9, "id95"], [9, "id99"], [9, "id104"], [9, "id109"], [9, "id114"], [9, "id119"], [9, "id123"], [9, "id127"], [9, "id132"], [9, "id137"], [9, "id142"], [9, "id146"], [9, "id150"], [9, "id155"], [9, "id159"], [9, "id163"], [9, "id167"], [9, "id169"], [9, "id171"], [9, "id173"], [10, "args"], [10, "id1"], [10, "id2"], [10, "id3"], [10, "id4"], [10, "id5"], [10, "id6"], [10, "id7"], [10, "id8"], [10, "id9"], [10, "id10"], [10, "id11"], [10, "id12"], [10, "id13"], [10, "id14"], [10, "id15"], [10, "id16"], [10, "id17"], [10, "id18"], [10, "id19"], [11, "args"], [11, "id3"], [11, "id4"], [11, "id5"], [11, "id6"], [11, "id7"], [11, "id8"], [11, "id9"]], "Artefact": [[8, "artefact"]], "ArtefactDetection": [[16, "artefactdetection"]], "Attribution": [[2, "attribution"]], "Available Datasets": [[17, "available-datasets"]], "Available architectures": [[19, "available-architectures"], [19, "id1"], [19, "id2"]], "Available contribution modules": [[16, "available-contribution-modules"]], "Block": [[8, "block"]], "Changelog": [[0, null]], "Choose a ready to use dataset": [[17, null]], "Choosing the right model": [[19, null]], "Classification": [[15, "classification"]], "Code quality": [[3, "code-quality"]], "Code style verification": [[3, "code-style-verification"]], "Codebase structure": [[3, "codebase-structure"]], "Commits": [[3, "commits"]], "Community resources": [[1, null]], "Composing transformations": [[10, "composing-transformations"]], "Continuous Integration": [[3, "continuous-integration"]], "Contributing to docTR": [[3, null]], "Contributor Covenant Code of Conduct": [[2, null]], "Custom dataset loader": [[7, "custom-dataset-loader"]], "Custom orientation classification models": [[13, "custom-orientation-classification-models"]], "Data Loading": [[17, "data-loading"]], "Dataloader": [[7, "dataloader"]], "Detection": [[15, "detection"], [17, "detection"]], "Detection predictors": [[19, "detection-predictors"]], "Developer mode installation": [[3, "developer-mode-installation"]], "Developing docTR": [[3, "developing-doctr"]], "Document": [[8, "document"]], "Document structure": [[8, "document-structure"]], "End-to-End OCR": [[19, "end-to-end-ocr"]], "Enforcement": [[2, "enforcement"]], "Enforcement Guidelines": [[2, "enforcement-guidelines"]], "Enforcement Responsibilities": [[2, "enforcement-responsibilities"]], "Export to ONNX": [[18, "export-to-onnx"]], "Feature requests & bug report": [[3, "feature-requests-bug-report"]], "Feedback": [[3, "feedback"]], "File reading": [[8, "file-reading"]], "Half-precision": [[18, "half-precision"]], "Installation": [[4, null]], "Integrate contributions into your pipeline": [[16, null]], "Let\u2019s connect": [[3, "let-s-connect"]], "Line": [[8, "line"]], "Loading from Huggingface Hub": [[15, "loading-from-huggingface-hub"]], "Loading your custom trained model": [[13, "loading-your-custom-trained-model"]], "Loading your custom trained orientation classification model": [[13, "loading-your-custom-trained-orientation-classification-model"]], "Main Features": [[5, "main-features"]], "Model optimization": [[18, "model-optimization"]], "Model zoo": [[5, "model-zoo"]], "Modifying the documentation": [[3, "modifying-the-documentation"]], "Naming conventions": [[15, "naming-conventions"]], "OCR": [[17, "ocr"]], "Object Detection": [[17, "object-detection"]], "Our Pledge": [[2, "our-pledge"]], "Our Standards": [[2, "our-standards"]], "Page": [[8, "page"]], "Preparing your model for inference": [[18, null]], "Prerequisites": [[4, "prerequisites"]], "Pretrained community models": [[15, "pretrained-community-models"]], "Pushing to the Huggingface Hub": [[15, "pushing-to-the-huggingface-hub"]], "Questions": [[3, "questions"]], "Recognition": [[15, "recognition"], [17, "recognition"]], "Recognition predictors": [[19, "recognition-predictors"]], "Returns:": [[7, "returns"], [8, "returns"], [8, "id11"], [8, "id13"], [8, "id15"], [8, "id19"], [8, "id23"], [8, "id27"], [8, "id31"], [9, "returns"], [9, "id6"], [9, "id11"], [9, "id16"], [9, "id20"], [9, "id24"], [9, "id29"], [9, "id34"], [9, "id39"], [9, "id44"], [9, "id49"], [9, "id53"], [9, "id57"], [9, "id62"], [9, "id66"], [9, "id71"], [9, "id76"], [9, "id80"], [9, "id84"], [9, "id88"], [9, "id93"], [9, "id98"], [9, "id102"], [9, "id107"], [9, "id112"], [9, "id117"], [9, "id122"], [9, "id126"], [9, "id130"], [9, "id135"], [9, "id140"], [9, "id145"], [9, "id149"], [9, "id153"], [9, "id158"], [9, "id162"], [9, "id166"], [9, "id168"], [9, "id170"], [9, "id172"], [11, "returns"]], "Scope": [[2, "scope"]], "Share your model with the community": [[15, null]], "Supported Vocabs": [[7, "supported-vocabs"]], "Supported contribution modules": [[6, "supported-contribution-modules"]], "Supported datasets": [[5, "supported-datasets"]], "Supported transformations": [[10, "supported-transformations"]], "Synthetic dataset generator": [[7, "synthetic-dataset-generator"], [17, "synthetic-dataset-generator"]], "Task evaluation": [[11, "task-evaluation"]], "Text Detection": [[19, "text-detection"]], "Text Recognition": [[19, "text-recognition"]], "Text detection models": [[5, "text-detection-models"]], "Text recognition models": [[5, "text-recognition-models"]], "Train your own model": [[13, null]], "Two-stage approaches": [[19, "two-stage-approaches"]], "Unit tests": [[3, "unit-tests"]], "Use your own datasets": [[17, "use-your-own-datasets"]], "Using your ONNX exported model": [[18, "using-your-onnx-exported-model"]], "Via Conda (Only for Linux)": [[4, "via-conda-only-for-linux"]], "Via Git": [[4, "via-git"]], "Via Python Package": [[4, "via-python-package"]], "Visualization": [[11, "visualization"]], "What should I do with the output?": [[19, "what-should-i-do-with-the-output"]], "Word": [[8, "word"]], "docTR Notebooks": [[12, null]], "docTR Vocabs": [[7, "id62"]], "docTR: Document Text Recognition": [[5, null]], "doctr.contrib": [[6, null]], "doctr.datasets": [[7, null], [7, "datasets"]], "doctr.io": [[8, null]], "doctr.models": [[9, null]], "doctr.models.classification": [[9, "doctr-models-classification"]], "doctr.models.detection": [[9, "doctr-models-detection"]], "doctr.models.factory": [[9, "doctr-models-factory"]], "doctr.models.recognition": [[9, "doctr-models-recognition"]], "doctr.models.zoo": [[9, "doctr-models-zoo"]], "doctr.transforms": [[10, null]], "doctr.utils": [[11, null]], "v0.1.0 (2021-03-05)": [[0, "v0-1-0-2021-03-05"]], "v0.1.1 (2021-03-18)": [[0, "v0-1-1-2021-03-18"]], "v0.10.0 (2024-10-21)": [[0, "v0-10-0-2024-10-21"]], "v0.2.0 (2021-05-11)": [[0, "v0-2-0-2021-05-11"]], "v0.2.1 (2021-05-28)": [[0, "v0-2-1-2021-05-28"]], "v0.3.0 (2021-07-02)": [[0, "v0-3-0-2021-07-02"]], "v0.3.1 (2021-08-27)": [[0, "v0-3-1-2021-08-27"]], "v0.4.0 (2021-10-01)": [[0, "v0-4-0-2021-10-01"]], "v0.4.1 (2021-11-22)": [[0, "v0-4-1-2021-11-22"]], "v0.5.0 (2021-12-31)": [[0, "v0-5-0-2021-12-31"]], "v0.5.1 (2022-03-22)": [[0, "v0-5-1-2022-03-22"]], "v0.6.0 (2022-09-29)": [[0, "v0-6-0-2022-09-29"]], "v0.7.0 (2023-09-09)": [[0, "v0-7-0-2023-09-09"]], "v0.8.0 (2024-02-28)": [[0, "v0-8-0-2024-02-28"]], "v0.8.1 (2024-03-04)": [[0, "v0-8-1-2024-03-04"]], "v0.9.0 (2024-08-08)": [[0, "v0-9-0-2024-08-08"]]}, "docnames": ["changelog", "community/resources", "contributing/code_of_conduct", "contributing/contributing", "getting_started/installing", "index", "modules/contrib", "modules/datasets", "modules/io", "modules/models", "modules/transforms", "modules/utils", "notebooks", "using_doctr/custom_models_training", "using_doctr/running_on_aws", "using_doctr/sharing_models", "using_doctr/using_contrib_modules", "using_doctr/using_datasets", "using_doctr/using_model_export", "using_doctr/using_models"], "envversion": {"sphinx": 64, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinx.ext.viewcode": 1}, "filenames": ["changelog.rst", "community/resources.rst", "contributing/code_of_conduct.md", "contributing/contributing.md", "getting_started/installing.rst", "index.rst", "modules/contrib.rst", "modules/datasets.rst", "modules/io.rst", "modules/models.rst", "modules/transforms.rst", "modules/utils.rst", "notebooks.rst", "using_doctr/custom_models_training.rst", "using_doctr/running_on_aws.rst", "using_doctr/sharing_models.rst", "using_doctr/using_contrib_modules.rst", "using_doctr/using_datasets.rst", "using_doctr/using_model_export.rst", "using_doctr/using_models.rst"], "indexentries": {"artefact (class in doctr.io)": [[8, "doctr.io.Artefact", false]], "block (class in doctr.io)": [[8, "doctr.io.Block", false]], "channelshuffle (class in doctr.transforms)": [[10, "doctr.transforms.ChannelShuffle", false]], "charactergenerator (class in doctr.datasets)": [[7, "doctr.datasets.CharacterGenerator", false]], "colorinversion (class in doctr.transforms)": [[10, "doctr.transforms.ColorInversion", false]], "compose (class in doctr.transforms)": [[10, "doctr.transforms.Compose", false]], "cord (class in doctr.datasets)": [[7, "doctr.datasets.CORD", false]], "crnn_mobilenet_v3_large() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_large", false]], "crnn_mobilenet_v3_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_mobilenet_v3_small", false]], "crnn_vgg16_bn() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.crnn_vgg16_bn", false]], "crop_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.crop_orientation_predictor", false]], "dataloader (class in doctr.datasets.loader)": [[7, "doctr.datasets.loader.DataLoader", false]], "db_mobilenet_v3_large() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_mobilenet_v3_large", false]], "db_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.db_resnet50", false]], "decode_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.decode_img_as_tensor", false]], "detection_predictor() (in module doctr.models.detection)": [[9, "doctr.models.detection.detection_predictor", false]], "detectiondataset (class in doctr.datasets)": [[7, "doctr.datasets.DetectionDataset", false]], "detectionmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.DetectionMetric", false]], "docartefacts (class in doctr.datasets)": [[7, "doctr.datasets.DocArtefacts", false]], "document (class in doctr.io)": [[8, "doctr.io.Document", false]], "documentfile (class in doctr.io)": [[8, "doctr.io.DocumentFile", false]], "encode_sequences() (in module doctr.datasets)": [[7, "doctr.datasets.encode_sequences", false]], "fast_base() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_base", false]], "fast_small() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_small", false]], "fast_tiny() (in module doctr.models.detection)": [[9, "doctr.models.detection.fast_tiny", false]], "from_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.from_hub", false]], "from_images() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_images", false]], "from_pdf() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_pdf", false]], "from_url() (doctr.io.documentfile class method)": [[8, "doctr.io.DocumentFile.from_url", false]], "funsd (class in doctr.datasets)": [[7, "doctr.datasets.FUNSD", false]], "gaussianblur (class in doctr.transforms)": [[10, "doctr.transforms.GaussianBlur", false]], "gaussiannoise (class in doctr.transforms)": [[10, "doctr.transforms.GaussianNoise", false]], "ic03 (class in doctr.datasets)": [[7, "doctr.datasets.IC03", false]], "ic13 (class in doctr.datasets)": [[7, "doctr.datasets.IC13", false]], "iiit5k (class in doctr.datasets)": [[7, "doctr.datasets.IIIT5K", false]], "iiithws (class in doctr.datasets)": [[7, "doctr.datasets.IIITHWS", false]], "imgur5k (class in doctr.datasets)": [[7, "doctr.datasets.IMGUR5K", false]], "kie_predictor() (in module doctr.models)": [[9, "doctr.models.kie_predictor", false]], "lambdatransformation (class in doctr.transforms)": [[10, "doctr.transforms.LambdaTransformation", false]], "line (class in doctr.io)": [[8, "doctr.io.Line", false]], "linknet_resnet18() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet18", false]], "linknet_resnet34() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet34", false]], "linknet_resnet50() (in module doctr.models.detection)": [[9, "doctr.models.detection.linknet_resnet50", false]], "localizationconfusion (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.LocalizationConfusion", false]], "login_to_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.login_to_hub", false]], "magc_resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.magc_resnet31", false]], "master() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.master", false]], "mjsynth (class in doctr.datasets)": [[7, "doctr.datasets.MJSynth", false]], "mobilenet_v3_large() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large", false]], "mobilenet_v3_large_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_large_r", false]], "mobilenet_v3_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small", false]], "mobilenet_v3_small_crop_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_crop_orientation", false]], "mobilenet_v3_small_page_orientation() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_page_orientation", false]], "mobilenet_v3_small_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.mobilenet_v3_small_r", false]], "normalize (class in doctr.transforms)": [[10, "doctr.transforms.Normalize", false]], "ocr_predictor() (in module doctr.models)": [[9, "doctr.models.ocr_predictor", false]], "ocrdataset (class in doctr.datasets)": [[7, "doctr.datasets.OCRDataset", false]], "ocrmetric (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.OCRMetric", false]], "oneof (class in doctr.transforms)": [[10, "doctr.transforms.OneOf", false]], "page (class in doctr.io)": [[8, "doctr.io.Page", false]], "page_orientation_predictor() (in module doctr.models.classification)": [[9, "doctr.models.classification.page_orientation_predictor", false]], "parseq() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.parseq", false]], "push_to_hf_hub() (in module doctr.models.factory)": [[9, "doctr.models.factory.push_to_hf_hub", false]], "randomapply (class in doctr.transforms)": [[10, "doctr.transforms.RandomApply", false]], "randombrightness (class in doctr.transforms)": [[10, "doctr.transforms.RandomBrightness", false]], "randomcontrast (class in doctr.transforms)": [[10, "doctr.transforms.RandomContrast", false]], "randomcrop (class in doctr.transforms)": [[10, "doctr.transforms.RandomCrop", false]], "randomgamma (class in doctr.transforms)": [[10, "doctr.transforms.RandomGamma", false]], "randomhorizontalflip (class in doctr.transforms)": [[10, "doctr.transforms.RandomHorizontalFlip", false]], "randomhue (class in doctr.transforms)": [[10, "doctr.transforms.RandomHue", false]], "randomjpegquality (class in doctr.transforms)": [[10, "doctr.transforms.RandomJpegQuality", false]], "randomresize (class in doctr.transforms)": [[10, "doctr.transforms.RandomResize", false]], "randomrotate (class in doctr.transforms)": [[10, "doctr.transforms.RandomRotate", false]], "randomsaturation (class in doctr.transforms)": [[10, "doctr.transforms.RandomSaturation", false]], "randomshadow (class in doctr.transforms)": [[10, "doctr.transforms.RandomShadow", false]], "read_html() (in module doctr.io)": [[8, "doctr.io.read_html", false]], "read_img_as_numpy() (in module doctr.io)": [[8, "doctr.io.read_img_as_numpy", false]], "read_img_as_tensor() (in module doctr.io)": [[8, "doctr.io.read_img_as_tensor", false]], "read_pdf() (in module doctr.io)": [[8, "doctr.io.read_pdf", false]], "recognition_predictor() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.recognition_predictor", false]], "recognitiondataset (class in doctr.datasets)": [[7, "doctr.datasets.RecognitionDataset", false]], "resize (class in doctr.transforms)": [[10, "doctr.transforms.Resize", false]], "resnet18() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet18", false]], "resnet31() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet31", false]], "resnet34() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet34", false]], "resnet50() (in module doctr.models.classification)": [[9, "doctr.models.classification.resnet50", false]], "sar_resnet31() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.sar_resnet31", false]], "show() (doctr.io.document method)": [[8, "doctr.io.Document.show", false]], "show() (doctr.io.page method)": [[8, "doctr.io.Page.show", false]], "sroie (class in doctr.datasets)": [[7, "doctr.datasets.SROIE", false]], "summary() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.summary", false]], "summary() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.summary", false]], "summary() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.summary", false]], "summary() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.summary", false]], "svhn (class in doctr.datasets)": [[7, "doctr.datasets.SVHN", false]], "svt (class in doctr.datasets)": [[7, "doctr.datasets.SVT", false]], "synthtext (class in doctr.datasets)": [[7, "doctr.datasets.SynthText", false]], "textmatch (class in doctr.utils.metrics)": [[11, "doctr.utils.metrics.TextMatch", false]], "textnet_base() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_base", false]], "textnet_small() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_small", false]], "textnet_tiny() (in module doctr.models.classification)": [[9, "doctr.models.classification.textnet_tiny", false]], "togray (class in doctr.transforms)": [[10, "doctr.transforms.ToGray", false]], "update() (doctr.utils.metrics.detectionmetric method)": [[11, "doctr.utils.metrics.DetectionMetric.update", false]], "update() (doctr.utils.metrics.localizationconfusion method)": [[11, "doctr.utils.metrics.LocalizationConfusion.update", false]], "update() (doctr.utils.metrics.ocrmetric method)": [[11, "doctr.utils.metrics.OCRMetric.update", false]], "update() (doctr.utils.metrics.textmatch method)": [[11, "doctr.utils.metrics.TextMatch.update", false]], "vgg16_bn_r() (in module doctr.models.classification)": [[9, "doctr.models.classification.vgg16_bn_r", false]], "visualize_page() (in module doctr.utils.visualization)": [[11, "doctr.utils.visualization.visualize_page", false]], "vit_b() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_b", false]], "vit_s() (in module doctr.models.classification)": [[9, "doctr.models.classification.vit_s", false]], "vitstr_base() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_base", false]], "vitstr_small() (in module doctr.models.recognition)": [[9, "doctr.models.recognition.vitstr_small", false]], "wildreceipt (class in doctr.datasets)": [[7, "doctr.datasets.WILDRECEIPT", false]], "word (class in doctr.io)": [[8, "doctr.io.Word", false]], "wordgenerator (class in doctr.datasets)": [[7, "doctr.datasets.WordGenerator", false]]}, "objects": {"doctr.datasets": [[7, 0, 1, "", "CORD"], [7, 0, 1, "", "CharacterGenerator"], [7, 0, 1, "", "DetectionDataset"], [7, 0, 1, "", "DocArtefacts"], [7, 0, 1, "", "FUNSD"], [7, 0, 1, "", "IC03"], [7, 0, 1, "", "IC13"], [7, 0, 1, "", "IIIT5K"], [7, 0, 1, "", "IIITHWS"], [7, 0, 1, "", "IMGUR5K"], [7, 0, 1, "", "MJSynth"], [7, 0, 1, "", "OCRDataset"], [7, 0, 1, "", "RecognitionDataset"], [7, 0, 1, "", "SROIE"], [7, 0, 1, "", "SVHN"], [7, 0, 1, "", "SVT"], [7, 0, 1, "", "SynthText"], [7, 0, 1, "", "WILDRECEIPT"], [7, 0, 1, "", "WordGenerator"], [7, 1, 1, "", "encode_sequences"]], "doctr.datasets.loader": [[7, 0, 1, "", "DataLoader"]], "doctr.io": [[8, 0, 1, "", "Artefact"], [8, 0, 1, "", "Block"], [8, 0, 1, "", "Document"], [8, 0, 1, "", "DocumentFile"], [8, 0, 1, "", "Line"], [8, 0, 1, "", "Page"], [8, 0, 1, "", "Word"], [8, 1, 1, "", "decode_img_as_tensor"], [8, 1, 1, "", "read_html"], [8, 1, 1, "", "read_img_as_numpy"], [8, 1, 1, "", "read_img_as_tensor"], [8, 1, 1, "", "read_pdf"]], "doctr.io.Document": [[8, 2, 1, "", "show"]], "doctr.io.DocumentFile": [[8, 2, 1, "", "from_images"], [8, 2, 1, "", "from_pdf"], [8, 2, 1, "", "from_url"]], "doctr.io.Page": [[8, 2, 1, "", "show"]], "doctr.models": [[9, 1, 1, "", "kie_predictor"], [9, 1, 1, "", "ocr_predictor"]], "doctr.models.classification": [[9, 1, 1, "", "crop_orientation_predictor"], [9, 1, 1, "", "magc_resnet31"], [9, 1, 1, "", "mobilenet_v3_large"], [9, 1, 1, "", "mobilenet_v3_large_r"], [9, 1, 1, "", "mobilenet_v3_small"], [9, 1, 1, "", "mobilenet_v3_small_crop_orientation"], [9, 1, 1, "", "mobilenet_v3_small_page_orientation"], [9, 1, 1, "", "mobilenet_v3_small_r"], [9, 1, 1, "", "page_orientation_predictor"], [9, 1, 1, "", "resnet18"], [9, 1, 1, "", "resnet31"], [9, 1, 1, "", "resnet34"], [9, 1, 1, "", "resnet50"], [9, 1, 1, "", "textnet_base"], [9, 1, 1, "", "textnet_small"], [9, 1, 1, "", "textnet_tiny"], [9, 1, 1, "", "vgg16_bn_r"], [9, 1, 1, "", "vit_b"], [9, 1, 1, "", "vit_s"]], "doctr.models.detection": [[9, 1, 1, "", "db_mobilenet_v3_large"], [9, 1, 1, "", "db_resnet50"], [9, 1, 1, "", "detection_predictor"], [9, 1, 1, "", "fast_base"], [9, 1, 1, "", "fast_small"], [9, 1, 1, "", "fast_tiny"], [9, 1, 1, "", "linknet_resnet18"], [9, 1, 1, "", "linknet_resnet34"], [9, 1, 1, "", "linknet_resnet50"]], "doctr.models.factory": [[9, 1, 1, "", "from_hub"], [9, 1, 1, "", "login_to_hub"], [9, 1, 1, "", "push_to_hf_hub"]], "doctr.models.recognition": [[9, 1, 1, "", "crnn_mobilenet_v3_large"], [9, 1, 1, "", "crnn_mobilenet_v3_small"], [9, 1, 1, "", "crnn_vgg16_bn"], [9, 1, 1, "", "master"], [9, 1, 1, "", "parseq"], [9, 1, 1, "", "recognition_predictor"], [9, 1, 1, "", "sar_resnet31"], [9, 1, 1, "", "vitstr_base"], [9, 1, 1, "", "vitstr_small"]], "doctr.transforms": [[10, 0, 1, "", "ChannelShuffle"], [10, 0, 1, "", "ColorInversion"], [10, 0, 1, "", "Compose"], [10, 0, 1, "", "GaussianBlur"], [10, 0, 1, "", "GaussianNoise"], [10, 0, 1, "", "LambdaTransformation"], [10, 0, 1, "", "Normalize"], [10, 0, 1, "", "OneOf"], [10, 0, 1, "", "RandomApply"], [10, 0, 1, "", "RandomBrightness"], [10, 0, 1, "", "RandomContrast"], [10, 0, 1, "", "RandomCrop"], [10, 0, 1, "", "RandomGamma"], [10, 0, 1, "", "RandomHorizontalFlip"], [10, 0, 1, "", "RandomHue"], [10, 0, 1, "", "RandomJpegQuality"], [10, 0, 1, "", "RandomResize"], [10, 0, 1, "", "RandomRotate"], [10, 0, 1, "", "RandomSaturation"], [10, 0, 1, "", "RandomShadow"], [10, 0, 1, "", "Resize"], [10, 0, 1, "", "ToGray"]], "doctr.utils.metrics": [[11, 0, 1, "", "DetectionMetric"], [11, 0, 1, "", "LocalizationConfusion"], [11, 0, 1, "", "OCRMetric"], [11, 0, 1, "", "TextMatch"]], "doctr.utils.metrics.DetectionMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.LocalizationConfusion": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.OCRMetric": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.metrics.TextMatch": [[11, 2, 1, "", "summary"], [11, 2, 1, "", "update"]], "doctr.utils.visualization": [[11, 1, 1, "", "visualize_page"]]}, "objnames": {"0": ["py", "class", "Python class"], "1": ["py", "function", "Python function"], "2": ["py", "method", "Python method"]}, "objtypes": {"0": "py:class", "1": "py:function", "2": "py:method"}, "terms": {"": [2, 8, 9, 11, 15, 18], "0": [2, 4, 7, 10, 11, 13, 16, 17, 19], "00": 19, "01": 19, "0123456789": 7, "0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "0123456789\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "02562": 9, "03": 19, "035": 19, "0361328125": 19, "04": 19, "05": 19, "06": 19, "06640625": 19, "07": 19, "08": [10, 19], "09": 19, "0966796875": 19, "1": [7, 8, 9, 10, 11, 13, 17, 19], "10": [4, 7, 11, 19], "100": [7, 10, 11, 17, 19], "1000": 19, "101": 7, "1024": [9, 13, 19], "104": 7, "106": 7, "108": 7, "1095": 17, "11": 19, "110": 11, "1107": 17, "114": 7, "115": 7, "1156": 17, "116": 7, "118": 7, "11800h": 19, "11th": 19, "12": 19, "120": 7, "123": 7, "126": 7, "1268": 17, "128": [9, 13, 18, 19], "13": 19, "130": 7, "13068": 17, "131": 7, "1337891": 17, "1357421875": 19, "1396484375": 19, "14": 19, "1420": 19, "14470v1": 7, "149": 17, "15": 19, "150": [11, 19], "1552": 19, "16": [9, 18, 19], "1630859375": 19, "1684": 19, "16x16": 9, "17": 19, "1778": 19, "1782": 19, "18": [9, 19], "185546875": 19, "1900": 19, "1910": 9, "19342": 17, "19370": 17, "195": 7, "19598": 17, "199": 19, "1999": 19, "2": [4, 5, 7, 8, 9, 10, 16, 19], "20": 19, "200": 11, "2000": 17, "2003": [5, 7], "2012": 7, "2013": [5, 7], "2015": 7, "2019": 5, "2023": 1, "207901": 17, "21": 19, "2103": 7, "2186": 17, "21888": 17, "22": 19, "224": [9, 10], "225": 10, "22672": 17, "229": [10, 17], "23": 19, "233": 17, "236": 7, "24": 19, "246": 17, "249": 17, "25": 19, "2504": 19, "255": [8, 9, 10, 11, 19], "256": 9, "257": 17, "26": 19, "26032": 17, "264": 13, "27": 19, "2700": 17, "2710": 19, "2749": 13, "28": 19, "287": 13, "29": 19, "296": 13, "299": 13, "2d": 19, "3": [4, 5, 8, 9, 10, 11, 18, 19], "30": 19, "300": 17, "3000": 17, "301": 13, "30595": 19, "30ghz": 19, "31": 9, "32": [7, 9, 10, 13, 17, 18, 19], "3232421875": 19, "33": [10, 19], "33402": 17, "33608": 17, "34": [9, 19], "340": 19, "3456": 19, "3515625": 19, "36": 19, "360": 17, "37": [7, 19], "38": 19, "39": 19, "4": [9, 10, 11, 19], "40": 19, "406": 10, "41": 19, "42": 19, "43": 19, "44": 19, "45": 19, "456": 10, "46": 19, "47": 19, "472": 17, "48": [7, 19], "485": 10, "49": 19, "49377": 17, "5": [7, 10, 11, 16, 19], "50": [9, 17, 19], "51": 19, "51171875": 19, "512": 9, "52": [7, 19], "529": 19, "53": 19, "54": 19, "540": 19, "5478515625": 19, "55": 19, "56": 19, "57": 19, "58": [7, 19], "580": 19, "5810546875": 19, "583": 19, "59": 19, "597": 19, "5k": [5, 7], "5m": 19, "6": [10, 19], "60": 10, "600": [9, 11, 19], "61": 19, "62": 19, "626": 17, "63": 19, "64": [9, 10, 19], "641": 19, "647": 17, "65": 19, "66": 19, "67": 19, "68": 19, "69": 19, "693": 13, "694": 13, "695": 13, "6m": 19, "7": 19, "70": [7, 11, 19], "707470": 17, "71": [7, 19], "7100000": 17, "7141797": 17, "7149": 17, "72": 19, "72dpi": 8, "73": 19, "73257": 17, "74": 19, "75": [10, 19], "7581382": 17, "76": 19, "77": 19, "772": 13, "772875": 17, "78": 19, "785": 13, "79": 19, "793533": 17, "796": 17, "798": 13, "7m": 19, "8": [9, 10, 19], "80": 19, "800": [9, 11, 17, 19], "81": 19, "82": 19, "83": 19, "84": 19, "849": 17, "85": 19, "8564453125": 19, "857": 19, "85875": 17, "86": 19, "8603515625": 19, "87": 19, "8707": 17, "88": 19, "89": 19, "9": [10, 19], "90": 19, "90k": 7, "90kdict32px": 7, "91": 19, "914085328578949": 19, "92": 19, "93": 19, "94": [7, 19], "95": [11, 19], "9578408598899841": 19, "96": 19, "97": 19, "98": 19, "99": 19, "9949972033500671": 19, "A": [2, 3, 5, 7, 8, 9, 12, 18], "As": 3, "Be": 19, "Being": 2, "By": 14, "For": [2, 3, 4, 13, 19], "If": [3, 8, 9, 13, 19], "In": [3, 7, 17], "It": [10, 15, 16, 18], "Its": [5, 9], "No": [2, 19], "Of": 7, "Or": [16, 18], "The": [2, 3, 7, 8, 11, 14, 16, 17, 18, 19], "Then": 9, "To": [3, 4, 14, 15, 16, 18, 19], "_": [2, 7, 9], "__call__": 19, "_build": 3, "_i": 11, "ab": 7, "abc": 18, "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz": 7, "abdef": [7, 17], "abl": [17, 19], "about": [2, 17, 19], "abov": 19, "abstract": 1, "abstractdataset": 7, "abus": 2, "accept": 2, "access": [5, 8, 17, 19], "account": [2, 15], "accur": 19, "accuraci": 11, "achiev": 18, "act": 2, "action": 2, "activ": 5, "ad": [3, 9, 10], "adapt": 2, "add": [10, 11, 15, 19], "add_hook": 19, "add_label": 11, "addit": [3, 4, 8, 16, 19], "addition": [3, 19], "address": [2, 8], "adjust": 10, "advanc": 2, "advantag": 18, "advis": 3, "aesthet": [5, 7], "affect": 2, "after": [15, 19], "ag": 2, "again": 9, "aggreg": [11, 17], "aggress": 2, "align": [2, 8, 10], "all": [2, 3, 6, 7, 8, 10, 11, 16, 17, 19], "allow": [2, 18], "along": 19, "alreadi": [3, 18], "also": [2, 9, 15, 16, 17, 19], "alwai": 17, "an": [2, 3, 5, 7, 8, 9, 11, 16, 18, 19], "analysi": [8, 16], "ancient_greek": 7, "andrej": 1, "angl": [8, 10], "ani": [2, 7, 8, 9, 10, 11, 18, 19], "annot": 7, "anot": 17, "anoth": [9, 13, 17], "answer": 2, "anyascii": 11, "anyon": 5, "anyth": 16, "api": [3, 5], "apolog": 2, "apologi": 2, "app": 3, "appear": 2, "appli": [2, 7, 10], "applic": [5, 9], "appoint": 2, "appreci": 15, "appropri": [2, 3, 19], "ar": [2, 3, 4, 6, 7, 8, 10, 11, 12, 16, 17, 19], "arab": 7, "arabic_diacrit": 7, "arabic_lett": 7, "arabic_punctu": 7, "arbitrarili": [5, 9], "arch": [9, 15], "architectur": [5, 9, 15, 16], "area": 19, "argument": [7, 8, 9, 11, 13, 19], "around": 2, "arrai": [8, 10, 11], "art": [5, 16], "artefact": [11, 16, 19], "artefact_typ": 8, "articl": 1, "artifici": [5, 7], "arxiv": [7, 9], "asarrai": 11, "ascii_lett": 7, "aspect": [5, 9, 10, 19], "assess": 11, "assign": 11, "associ": 8, "assum": 9, "assume_straight_pag": [9, 13, 19], "astyp": [9, 11, 19], "attack": 2, "attend": [5, 9], "attent": [2, 9], "autom": 5, "automat": 19, "autoregress": [5, 9], "avail": [2, 5, 6, 10], "averag": [10, 19], "avoid": [2, 4], "aw": [5, 19], "awar": 19, "azur": 19, "b": [9, 11, 19], "b_j": 11, "back": 3, "backbon": 9, "backend": 19, "background": 17, "bangla": 7, "bar": 16, "bar_cod": 17, "baranovskij": 1, "base": [5, 9, 16], "baselin": [5, 9, 19], "batch": [7, 9, 10, 16, 17, 19], "batch_siz": [7, 9, 13, 16, 17, 18], "bblanchon": 4, "bbox": 19, "becaus": 14, "been": [3, 11, 17, 19], "befor": [7, 9, 10, 19], "begin": 11, "behavior": [2, 19], "being": [11, 19], "belong": 19, "benchmark": 19, "best": [1, 2], "better": [12, 19], "between": [10, 11, 19], "bgr": 8, "bilinear": 10, "bin_thresh": 19, "binar": [5, 9, 19], "binari": [8, 18, 19], "bit": 18, "block": [11, 19], "block_1_1": 19, "blur": 10, "bmvc": 7, "bn": 15, "bodi": [2, 19], "bool": [7, 8, 9, 10, 11], "boolean": [9, 19], "both": [5, 7, 10, 17, 19], "bottom": [9, 19], "bound": [7, 8, 9, 10, 11, 16, 17, 19], "box": [7, 8, 9, 10, 11, 16, 17, 19], "box_thresh": 19, "bright": 10, "browser": [3, 5], "build": [3, 4, 18], "built": 3, "byte": [8, 19], "c": [4, 8, 11], "c_j": 11, "cach": [3, 7, 14], "cache_sampl": 7, "call": 18, "callabl": [7, 10], "can": [3, 4, 13, 14, 15, 16, 17, 19], "capabl": [3, 12, 19], "case": [7, 11], "cf": 19, "cfg": 19, "challeng": 7, "challenge2_test_task12_imag": 7, "challenge2_test_task1_gt": 7, "challenge2_training_task12_imag": 7, "challenge2_training_task1_gt": 7, "chang": [14, 19], "channel": [2, 3, 8, 10], "channel_prior": 4, "channelshuffl": 10, "charact": [5, 7, 8, 11, 17, 19], "charactergener": [7, 17], "characterist": 2, "charg": 19, "charset": 19, "chart": 8, "check": [3, 15, 19], "checkpoint": 9, "chip": 4, "christian": 1, "ci": 3, "clarifi": 2, "clariti": 2, "class": [2, 7, 8, 10, 11, 19], "class_nam": 13, "classif": [17, 19], "classmethod": 8, "clear": 3, "clone": 4, "close": 3, "co": 15, "code": [5, 8, 16], "codecov": 3, "colab": 12, "collate_fn": 7, "collect": [8, 16], "color": 10, "colorinvers": 10, "column": 8, "com": [2, 4, 8, 9, 15], "combin": 19, "command": [3, 16], "comment": 2, "commit": 2, "common": [2, 10, 11, 18], "commun": 2, "compar": 5, "comparison": [11, 19], "competit": 7, "compil": [12, 19], "complaint": 2, "complementari": 11, "complet": 3, "compon": 19, "compos": [7, 19], "comprehens": 19, "comput": [7, 11, 18, 19], "conf_threshold": 16, "confid": [8, 19], "config": [4, 9], "configur": 9, "confus": 11, "consecut": [10, 19], "consequ": 2, "consid": [2, 3, 7, 8, 11, 19], "consist": 19, "consolid": [5, 7], "constant": 10, "construct": 2, "contact": 2, "contain": [1, 6, 7, 12, 17, 19], "content": [7, 8, 19], "context": 9, "contib": 4, "continu": 2, "contrast": 10, "contrast_factor": 10, "contrib": [4, 16], "contribut": 2, "contributor": 3, "convers": 8, "convert": [8, 10], "convolut": 9, "cool": 1, "coordin": [8, 19], "cord": [5, 7, 17, 19], "core": [11, 19], "corner": 19, "correct": 10, "correspond": [4, 8, 10, 19], "could": [2, 16], "counterpart": 11, "cover": 3, "coverag": 3, "cpu": [5, 13, 18], "creat": [1, 15], "crnn": [5, 9, 15], "crnn_mobilenet_v3_larg": [9, 15, 19], "crnn_mobilenet_v3_smal": [9, 18, 19], "crnn_vgg16_bn": [9, 13, 15, 19], "crop": [8, 9, 10, 13, 17, 19], "crop_orient": [8, 19], "crop_orientation_predictor": [9, 13], "crop_param": 13, "cuda": 18, "currenc": 7, "current": [3, 13, 19], "custom": [15, 16, 18, 19], "custom_crop_orientation_model": 13, "custom_page_orientation_model": 13, "customhook": 19, "cvit": 5, "czczup": 9, "czech": 7, "d": [7, 17], "danish": 7, "data": [5, 7, 8, 10, 11, 13, 15], "dataload": 17, "dataset": [9, 13, 19], "dataset_info": 7, "date": [13, 19], "db": 15, "db_mobilenet_v3_larg": [9, 15, 19], "db_resnet34": 19, "db_resnet50": [9, 13, 15, 19], "dbnet": [5, 9], "deal": [12, 19], "decis": 2, "decod": 8, "decode_img_as_tensor": 8, "dedic": 18, "deem": 2, "deep": [9, 19], "def": 19, "default": [4, 8, 13, 14, 19], "defer": 17, "defin": [11, 18], "degre": [8, 10, 19], "degress": 8, "delet": 3, "delimit": 19, "delta": 10, "demo": [3, 5], "demonstr": 2, "depend": [3, 4, 5, 19], "deploi": 3, "deploy": 5, "derogatori": 2, "describ": 9, "descript": 12, "design": 10, "desir": 8, "det_arch": [9, 13, 15, 18], "det_b": 19, "det_model": [13, 15, 18], "det_param": 13, "det_predictor": [13, 19], "detail": [13, 19], "detect": [1, 7, 8, 11, 12, 13, 16], "detect_languag": 9, "detect_orient": [9, 13, 19], "detection_predictor": [9, 19], "detection_task": [7, 17], "detectiondataset": [7, 17], "detectionmetr": 11, "detectionpredictor": [9, 13], "detector": [5, 9, 16], "deterior": 9, "determin": 2, "dev": [3, 14], "develop": 4, "deviat": 10, "devic": 18, "dict": [8, 11, 19], "dictionari": [8, 11], "differ": 2, "differenti": [5, 9], "digit": [5, 7, 17], "dimens": [8, 11, 19], "dimension": 10, "direct": 7, "directli": [15, 19], "directori": [3, 14], "disabl": [2, 14, 19], "disable_crop_orient": 19, "disable_page_orient": 19, "disclaim": 19, "discuss": 3, "disparag": 2, "displai": [8, 11], "display_artefact": 11, "distribut": 10, "div": 19, "divers": 2, "divid": 8, "do": [3, 4, 9], "doc": [3, 8, 16, 18, 19], "docartefact": [7, 17], "docstr": 3, "doctr": [1, 4, 13, 14, 15, 16, 17, 18, 19], "doctr_cache_dir": 14, "doctr_multiprocessing_dis": 14, "document": [1, 7, 9, 11, 12, 13, 16, 17, 18, 19], "documentbuild": 19, "documentfil": [8, 13, 15, 16, 18], "doesn": 18, "don": [13, 19], "done": 10, "download": [7, 17], "downsiz": 9, "draw": 10, "drop": 7, "drop_last": 7, "dtype": [8, 9, 10, 11, 18], "dual": [5, 7], "dummi": 15, "dummy_img": 19, "dummy_input": 18, "dure": 2, "dutch": 7, "dynam": [7, 16], "dynamic_seq_length": 7, "e": [2, 3, 4, 8, 9], "each": [5, 7, 8, 9, 10, 11, 17, 19], "eas": 3, "easi": [5, 11, 15, 18], "easili": [8, 11, 13, 15, 17, 19], "econom": 2, "edit": 2, "educ": 2, "effect": 19, "effici": [3, 5, 7, 9], "either": [11, 19], "element": [7, 8, 9, 19], "els": [3, 16], "email": 2, "empathi": 2, "en": 19, "enabl": [7, 8], "enclos": 8, "encod": [5, 7, 8, 9, 19], "encode_sequ": 7, "encount": 3, "encrypt": 8, "end": [5, 7, 9, 11], "english": [7, 17], "enough": [3, 19], "ensur": 3, "entri": 7, "environ": [2, 14], "eo": 7, "equiv": 19, "estim": 9, "etc": [8, 16], "ethnic": 2, "evalu": [17, 19], "event": 2, "everyon": 2, "everyth": [3, 19], "exact": [11, 19], "exampl": [2, 3, 5, 7, 9, 15, 19], "exchang": 18, "execut": 19, "exist": 15, "expand": 10, "expect": [8, 10, 11], "experi": 2, "explan": [2, 19], "explicit": 2, "exploit": [5, 9], "export": [8, 9, 11, 12, 16, 19], "export_as_straight_box": [9, 19], "export_as_xml": 19, "export_model_to_onnx": 18, "express": [2, 10], "extens": 8, "extern": [2, 17], "extract": [1, 5, 7], "extractor": 9, "f_": 11, "f_a": 11, "factor": 10, "fair": 2, "fairli": 2, "fals": [7, 8, 9, 10, 11, 13, 19], "faq": 2, "fascan": 15, "fast": [5, 7, 9], "fast_bas": [9, 19], "fast_smal": [9, 19], "fast_tini": [9, 19], "faster": [5, 9, 18], "fasterrcnn_mobilenet_v3_large_fpn": 9, "favorit": 19, "featur": [4, 9, 11, 12, 13, 16], "feedback": 2, "feel": [3, 15], "felix92": 15, "few": [18, 19], "figsiz": 11, "figur": [11, 16], "file": [3, 7], "final": 9, "find": [3, 17], "fine": 1, "finnish": 7, "first": [3, 7], "firsthand": 7, "fit": [9, 19], "flag": 19, "flip": 10, "float": [8, 10, 11, 18], "float32": [8, 9, 10, 18], "fn": 10, "focu": 15, "focus": [2, 7], "folder": 7, "follow": [2, 3, 4, 7, 10, 11, 13, 14, 15, 16, 19], "font": 7, "font_famili": 7, "foral": 11, "forc": 3, "forg": 4, "form": [5, 7, 19], "format": [8, 11, 13, 17, 18, 19], "forpost": [5, 7], "forum": 3, "found": 1, "fp16": 18, "frac": 11, "framework": [4, 15, 17, 19], "free": [2, 3, 15], "french": [7, 13, 15, 19], "friendli": 5, "from": [1, 2, 5, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19], "from_hub": [9, 15], "from_imag": [8, 15, 16, 18], "from_pdf": 8, "from_url": 8, "full": [7, 11, 19], "function": [7, 10, 11, 16], "funsd": [5, 7, 17, 19], "further": 17, "futur": 7, "g": [8, 9], "g_": 11, "g_x": 11, "gallagh": 1, "gamma": 10, "gaussian": 10, "gaussianblur": 10, "gaussiannois": 10, "gen": 19, "gender": 2, "gener": [3, 5, 8, 9], "generic_cyrillic_lett": 7, "geometri": [5, 8, 19], "geq": 11, "german": [7, 13, 15], "get": [18, 19], "git": 15, "github": [3, 4, 9, 15], "give": [2, 16], "given": [7, 8, 10, 11, 19], "global": 9, "go": 19, "good": 18, "googl": 3, "googlevis": 5, "gpu": [5, 16, 18], "gracefulli": 2, "graph": [5, 7, 8], "grayscal": 10, "ground": 11, "groung": 11, "group": [5, 19], "gt": 11, "gt_box": 11, "gt_label": 11, "guid": 3, "guidanc": 17, "gvision": 19, "h": [8, 9, 10], "h_": 11, "ha": [3, 7, 11, 17], "handl": [12, 17, 19], "handwrit": 7, "handwritten": 17, "harass": 2, "hardwar": 19, "harm": 2, "hat": 11, "have": [2, 3, 11, 13, 15, 17, 18, 19], "head": [9, 19], "healthi": 2, "hebrew": 7, "height": [8, 10], "hello": [11, 19], "help": 18, "here": [6, 10, 12, 16, 17, 19], "hf": 9, "hf_hub_download": 9, "high": 8, "higher": [4, 7, 19], "hindi": 7, "hindi_digit": 7, "hocr": 19, "hook": 19, "horizont": [8, 10, 19], "hous": 7, "how": [1, 3, 12, 13, 15, 17], "howev": 17, "hsv": 10, "html": [2, 3, 4, 8, 19], "http": [2, 4, 7, 8, 9, 15, 19], "hub": 9, "hue": 10, "huggingfac": 9, "hw": 7, "i": [2, 3, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18], "i7": 19, "ibrahimov": 1, "ic03": [5, 7, 17], "ic13": [5, 7, 17], "icdar": [5, 7], "icdar2019": 7, "id": 19, "ident": 2, "identifi": 5, "iiit": [5, 7], "iiit5k": [7, 17], "iiithw": [5, 7, 17], "imag": [1, 5, 7, 8, 9, 10, 11, 15, 16, 17, 19], "imagenet": 9, "imageri": 2, "images_90k_norm": 7, "img": [7, 10, 17, 18], "img_cont": 8, "img_fold": [7, 17], "img_path": 8, "img_transform": 7, "imgur5k": [5, 7, 17], "imgur5k_annot": 7, "imlist": 7, "impact": 2, "implement": [7, 8, 9, 10, 11, 19], "import": [7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19], "improv": 9, "inappropri": 2, "incid": 2, "includ": [2, 7, 17, 18], "inclus": 2, "increas": 10, "independ": 10, "index": [3, 8], "indic": 11, "individu": 2, "infer": [5, 9, 10, 16, 19], "inform": [1, 2, 3, 5, 7, 17], "input": [3, 8, 9, 10, 18, 19], "input_crop": 9, "input_pag": [9, 11, 19], "input_shap": 18, "input_tensor": 9, "inspir": [2, 10], "instal": [15, 16, 18], "instanc": [2, 19], "instanti": [9, 19], "instead": [7, 8, 9], "insult": 2, "int": [7, 8, 9, 10], "int64": 11, "integ": 11, "integr": [1, 5, 15, 17], "intel": 19, "interact": [2, 8, 11], "interfac": [15, 18], "interoper": 18, "interpol": 10, "interpret": [7, 8], "intersect": 11, "invert": 10, "investig": 2, "invis": 2, "involv": [2, 19], "io": [13, 15, 16, 18], "iou": 11, "iou_thresh": 11, "iou_threshold": 16, "irregular": [5, 9, 17], "isn": 7, "issu": [2, 3, 15], "italian": 7, "iter": [7, 10, 17, 19], "its": [8, 9, 10, 11, 17, 19], "itself": [9, 15], "j": 11, "jame": 1, "job": 3, "join": 3, "jpeg": 10, "jpegqual": 10, "jpg": [7, 8, 15, 18], "json": [7, 17, 19], "json_output": 19, "jump": 3, "just": 2, "kei": [5, 7], "kera": [9, 18], "kernel": [5, 9, 10], "kernel_shap": 10, "keywoard": 9, "keyword": [7, 8, 9, 11], "kie": [9, 13], "kie_predictor": [9, 13], "kiepredictor": 9, "kind": 2, "know": [3, 18], "kwarg": [7, 8, 9, 11], "l": 11, "l_j": 11, "label": [7, 11, 16, 17], "label_fil": [7, 17], "label_fold": 7, "label_path": [7, 17], "labels_path": [7, 17], "ladder": 2, "lambda": 10, "lambdatransform": 10, "lang": 19, "languag": [2, 5, 7, 8, 9, 15, 19], "larg": [9, 15], "largest": 11, "last": [4, 7], "latenc": 9, "later": 3, "latest": 19, "latin": 7, "layer": 18, "layout": 19, "lead": 2, "leader": 2, "learn": [2, 5, 9, 18, 19], "least": 4, "left": [11, 19], "legacy_french": 7, "length": [7, 19], "less": [18, 19], "level": [2, 7, 11, 19], "leverag": 12, "lf": 15, "librari": [3, 4, 12, 13], "light": 5, "lightweight": 18, "like": 2, "limits_": 11, "line": [5, 9, 11, 19], "line_1_1": 19, "link": 13, "linknet": [5, 9], "linknet_resnet18": [9, 13, 18, 19], "linknet_resnet34": [9, 18, 19], "linknet_resnet50": [9, 19], "list": [7, 8, 10, 11, 15], "ll": 11, "load": [5, 7, 9, 16, 18], "load_state_dict": 13, "load_weight": 13, "loc_pr": 19, "local": [3, 5, 7, 9, 11, 17, 19], "localis": 7, "localizationconfus": 11, "locat": [3, 8, 19], "login": 9, "login_to_hub": [9, 15], "logo": [8, 16, 17], "love": 15, "lower": [10, 11, 19], "m": [3, 11, 19], "m1": 4, "macbook": 4, "machin": 18, "made": 5, "magc_resnet31": 9, "mai": [2, 3], "mail": 2, "main": 12, "maintain": 5, "mainten": 3, "make": [2, 3, 11, 13, 14, 15, 18, 19], "mani": [17, 19], "manipul": 19, "map": [7, 9], "map_loc": 13, "master": [5, 9, 19], "match": [11, 19], "mathcal": 11, "matplotlib": [8, 11], "max": [7, 10, 11], "max_angl": 10, "max_area": 10, "max_char": [7, 17], "max_delta": 10, "max_gain": 10, "max_gamma": 10, "max_qual": 10, "max_ratio": 10, "maximum": [7, 10], "maxval": [9, 10], "mbox": 11, "mean": [10, 11, 13], "meaniou": 11, "meant": [8, 18], "measur": 19, "media": 2, "median": 9, "meet": 13, "member": 2, "memori": [14, 18], "mention": 19, "merg": 7, "messag": 3, "meta": 19, "metadata": 18, "metal": 4, "method": [8, 10, 19], "metric": [11, 19], "middl": 19, "might": [18, 19], "min": 10, "min_area": 10, "min_char": [7, 17], "min_gain": 10, "min_gamma": 10, "min_qual": 10, "min_ratio": 10, "min_val": 10, "minde": [1, 2, 4, 5, 9], "minim": [3, 5], "minimalist": [5, 9], "minimum": [4, 7, 10, 11, 19], "minval": 10, "miss": 4, "mistak": 2, "mixed_float16": 18, "mixed_precis": 18, "mjsynth": [5, 7, 17], "mnt": 7, "mobilenet": [9, 15], "mobilenet_v3_larg": 9, "mobilenet_v3_large_r": 9, "mobilenet_v3_smal": [9, 13], "mobilenet_v3_small_crop_orient": [9, 13], "mobilenet_v3_small_page_orient": [9, 13], "mobilenet_v3_small_r": 9, "mobilenetv3": 9, "modal": [5, 7], "mode": 4, "model": [7, 11, 14, 16, 17], "model_nam": [9, 15, 18], "model_path": [16, 18], "moder": 2, "modif": 3, "modifi": [9, 14, 19], "modul": [4, 8, 9, 10, 11, 19], "more": [3, 17, 19], "moscardi": 1, "most": 19, "mozilla": 2, "multi": [5, 9], "multilingu": [7, 15], "multipl": [7, 8, 10, 19], "multipli": 10, "multiprocess": 14, "my": 9, "my_awesome_model": 15, "my_hook": 19, "n": [7, 11], "name": [7, 9, 18, 19], "nation": 2, "natur": [2, 5, 7], "ndarrai": [7, 8, 10, 11], "necessari": [4, 13, 14], "need": [3, 4, 7, 11, 13, 14, 15, 16, 19], "neg": 10, "nest": 19, "netraj": 1, "network": [5, 7, 9, 18], "neural": [5, 7, 9, 18], "new": [3, 11], "next": [7, 17], "nois": 10, "noisi": [5, 7], "non": [5, 7, 8, 9, 10, 11], "none": [7, 8, 9, 10, 11, 19], "normal": [9, 10], "norwegian": 7, "note": [0, 3, 7, 9, 13, 15, 16, 18], "now": 3, "np": [9, 10, 11, 19], "num_output_channel": 10, "num_sampl": [7, 17], "number": [7, 9, 10, 11, 19], "numpi": [8, 9, 11, 19], "o": 4, "obb": 16, "obj_detect": 15, "object": [7, 8, 11, 16, 19], "objectness_scor": [8, 19], "oblig": 2, "obtain": 19, "occupi": 18, "ocr": [1, 5, 7, 9, 11, 15], "ocr_carea": 19, "ocr_db_crnn": 11, "ocr_lin": 19, "ocr_pag": 19, "ocr_par": 19, "ocr_predictor": [9, 13, 15, 18, 19], "ocrdataset": [7, 17], "ocrmetr": 11, "ocrpredictor": [9, 13], "ocrx_word": 19, "offens": 2, "offici": [2, 9], "offlin": 2, "offset": 10, "onc": 19, "one": [3, 7, 9, 10, 13, 15, 19], "oneof": 10, "ones": [7, 11], "onli": [3, 9, 10, 11, 13, 15, 17, 18, 19], "onlin": 2, "onnx": 16, "onnxruntim": [16, 18], "onnxtr": 18, "opac": 10, "opacity_rang": 10, "open": [1, 2, 3, 15, 18], "opinion": 2, "optic": [5, 19], "optim": [5, 19], "option": [7, 9, 13], "order": [3, 7, 8, 10], "org": [2, 7, 9, 19], "organ": 8, "orient": [2, 8, 9, 12, 16, 19], "orientationpredictor": 9, "other": [2, 3], "otherwis": [2, 8, 11], "our": [1, 3, 9, 19], "out": [3, 9, 10, 11, 19], "outpout": 19, "output": [8, 10, 18], "output_s": [8, 10], "outsid": 14, "over": [7, 11, 19], "overal": [2, 9], "overlai": 8, "overview": 16, "overwrit": 13, "overwritten": 15, "own": 5, "p": [10, 19], "packag": [3, 5, 11, 14, 16, 17, 18], "pad": [7, 9, 10, 19], "page": [4, 7, 9, 11, 13, 19], "page1": 8, "page2": 8, "page_1": 19, "page_idx": [8, 19], "page_orientation_predictor": [9, 13], "page_param": 13, "pair": 11, "paper": 9, "par_1_1": 19, "paragraph": 19, "paragraph_break": 19, "parallel": 9, "param": [10, 19], "paramet": [5, 8, 9, 18], "pars": [5, 7], "parseq": [5, 9, 15, 18, 19], "part": [7, 10, 19], "parti": 4, "partial": 19, "particip": 2, "pass": [7, 8, 9, 13, 19], "password": 8, "patch": [9, 11], "path": [7, 8, 16, 17, 18], "path_to_checkpoint": 13, "path_to_custom_model": 18, "path_to_pt": 13, "patil": 1, "pattern": 2, "pdf": [8, 9, 12], "pdfpage": 8, "peopl": 2, "per": [10, 19], "perform": [5, 8, 9, 10, 11, 14, 18, 19], "period": 2, "permiss": 2, "permut": [5, 9], "persian_lett": 7, "person": [2, 17], "phase": 19, "photo": 17, "physic": [2, 8], "pick": 10, "pictur": 8, "pip": [3, 4, 16, 18], "pipelin": 19, "pixel": [8, 10, 19], "pleas": 3, "plot": 11, "plt": 11, "plug": 15, "plugin": 4, "png": 8, "point": 18, "polici": 14, "polish": 7, "polit": 2, "polygon": [7, 11, 19], "pool": 9, "portugues": 7, "posit": [2, 11], "possibl": [3, 11, 15, 19], "post": [2, 19], "postprocessor": 19, "potenti": 9, "power": 5, "ppageno": 19, "pre": [3, 9, 18], "precis": [11, 19], "pred": 11, "pred_box": 11, "pred_label": 11, "predefin": 17, "predict": [8, 9, 11, 19], "predictor": [5, 8, 9, 12, 13, 15, 18], "prefer": 17, "preinstal": 4, "preprocessor": [13, 19], "prerequisit": 15, "present": 12, "preserv": [9, 10, 19], "preserve_aspect_ratio": [8, 9, 10, 13, 19], "pretrain": [5, 9, 11, 13, 18, 19], "pretrained_backbon": [9, 13], "print": 19, "prior": 7, "privaci": 2, "privat": 2, "probabl": [1, 10], "problem": 3, "procedur": 10, "process": [3, 5, 8, 9, 13, 19], "processor": 19, "produc": [12, 19], "product": 18, "profession": 2, "project": [3, 17], "promptli": 2, "proper": 3, "properli": 7, "provid": [2, 3, 5, 15, 16, 17, 19], "public": [2, 5], "publicli": 19, "publish": 2, "pull": 15, "punctuat": 7, "pure": 7, "purpos": 3, "push_to_hf_hub": [9, 15], "py": 15, "pypdfium2": [4, 8], "pyplot": [8, 11], "python": [1, 3, 16], "python3": 15, "pytorch": [4, 5, 9, 10, 13, 15, 18, 19], "q": 3, "qr": [8, 16], "qr_code": 17, "qualiti": 10, "question": 2, "quickli": 5, "quicktour": 12, "r": 19, "race": 2, "ramdisk": 7, "rand": [9, 10, 11, 18, 19], "random": [9, 10, 11, 19], "randomappli": 10, "randombright": 10, "randomcontrast": 10, "randomcrop": 10, "randomgamma": 10, "randomhorizontalflip": 10, "randomhu": 10, "randomjpegqu": 10, "randomli": 10, "randomres": 10, "randomrot": 10, "randomsatur": 10, "randomshadow": 10, "rang": 10, "rassi": 15, "ratio": [9, 10, 19], "raw": [8, 11], "re": 18, "read": [5, 7, 9], "read_html": 8, "read_img_as_numpi": 8, "read_img_as_tensor": 8, "read_pdf": 8, "readi": 18, "real": [1, 5, 9, 10], "realli": 1, "reason": [2, 5, 7], "rebuild": 3, "rebuilt": 3, "recal": [11, 19], "receipt": [5, 7, 19], "reco_arch": [9, 13, 15, 18], "reco_b": 19, "reco_model": [13, 15, 18], "reco_param": 13, "reco_predictor": 13, "recogn": 19, "recognit": [7, 11, 12, 13], "recognition_predictor": [9, 19], "recognition_task": [7, 17], "recognitiondataset": [7, 17], "recognitionpredictor": [9, 13], "rectangular": 9, "reduc": [4, 10], "refer": [3, 4, 13, 15, 16, 17, 19], "regardless": 2, "region": 19, "regroup": 11, "regular": 17, "reject": 2, "rel": [8, 10, 11, 19], "relat": 8, "releas": [0, 4], "relev": 16, "religion": 2, "remov": 2, "render": [8, 19], "repo": 9, "repo_id": [9, 15], "report": 2, "repositori": [7, 9, 15], "repres": [2, 18, 19], "represent": [5, 9], "request": [2, 15], "requir": [4, 10, 18], "research": 5, "residu": 9, "resiz": [10, 19], "resnet": 9, "resnet18": [9, 15], "resnet31": 9, "resnet34": 9, "resnet50": [9, 15], "resolv": 8, "resolve_block": 19, "resolve_lin": 19, "resourc": 17, "respect": 2, "rest": [3, 10, 11], "restrict": 14, "result": [3, 7, 8, 12, 15, 18, 19], "return": 19, "reusabl": 19, "review": 2, "rgb": [8, 10], "rgb_mode": 8, "rgb_output": 8, "right": [2, 9, 11], "roboflow": 1, "robust": [5, 7], "root": 7, "rotat": [7, 8, 9, 10, 11, 12, 13, 17, 19], "run": [3, 4, 9], "same": [3, 8, 11, 17, 18, 19], "sampl": [7, 9, 17, 19], "sample_transform": 7, "sanjin": 1, "sar": [5, 9], "sar_resnet31": [9, 19], "satur": 10, "save": [9, 17], "scale": [8, 9, 10, 11], "scale_rang": 10, "scan": [5, 7], "scene": [5, 7, 9], "score": [8, 11], "script": [3, 17], "seamless": 5, "seamlessli": [5, 19], "search": [1, 9], "searchabl": 12, "sec": 19, "second": 19, "section": [1, 13, 15, 16, 18, 19], "secur": [2, 14], "see": [2, 3], "seen": 19, "segment": [5, 9, 19], "self": 19, "semant": [5, 9], "send": 19, "sens": 11, "sensit": 17, "separ": 19, "sequenc": [5, 7, 8, 9, 11, 19], "sequenti": [10, 19], "seri": 2, "seriou": 2, "set": [2, 4, 7, 9, 11, 14, 16, 19], "set_global_polici": 18, "sever": [8, 10, 19], "sex": 2, "sexual": 2, "shade": 10, "shape": [5, 8, 9, 10, 11, 19], "share": [14, 17], "shift": 10, "shm": 14, "should": [3, 7, 8, 10, 11], "show": [5, 8, 9, 11, 13, 15, 16], "showcas": [3, 12], "shuffl": [7, 10], "side": 11, "signatur": 8, "signific": 17, "simpl": [5, 9, 18], "simpler": 9, "sinc": [7, 17], "singl": [2, 3, 5, 7], "single_img_doc": 18, "size": [2, 7, 8, 10, 16, 19], "skew": 19, "slack": 3, "slightli": 9, "small": [3, 9, 19], "smallest": 8, "snapshot_download": 9, "snippet": 19, "so": [3, 4, 7, 9, 15, 17], "social": 2, "socio": 2, "some": [1, 4, 12, 15, 17], "someth": 3, "somewher": 3, "sort": 2, "sourc": [1, 7, 8, 9, 10, 11, 15], "space": [2, 19], "span": 19, "spanish": 7, "spatial": [5, 7, 8], "specif": [3, 4, 11, 13, 17, 19], "specifi": [2, 7, 8], "speed": [5, 9, 19], "sphinx": 3, "sroie": [5, 7, 17], "stabl": 4, "stackoverflow": 3, "stage": 5, "standalon": 12, "standard": 10, "start": 7, "state": [1, 5, 11, 16], "static": 11, "statist": 1, "statu": 2, "std": [10, 13], "step": 14, "still": 19, "str": [7, 8, 9, 10, 11], "straight": [7, 9, 17, 19], "straighten": 19, "straighten_pag": [9, 13, 19], "straigten_pag": 13, "stream": 8, "street": [5, 7], "strict": 4, "strictli": 11, "string": [7, 8, 11, 19], "strive": 4, "strong": [5, 9], "structur": [18, 19], "subset": [7, 19], "suggest": [3, 15], "sum": 11, "summari": 11, "support": [4, 13, 16, 18, 19], "sustain": 2, "svhn": [5, 7, 17], "svt": [7, 17], "swedish": 7, "symmetr": [9, 10, 19], "symmetric_pad": [9, 10, 19], "synthet": 5, "synthtext": [5, 7, 17], "system": 19, "t": [3, 7, 13, 18, 19], "tabl": [15, 16, 17], "take": [2, 7, 19], "target": [7, 8, 10, 11, 17], "target_s": 7, "task": [5, 7, 9, 15, 17, 19], "task2": 7, "team": 4, "techminde": 4, "templat": [3, 5], "tensor": [7, 8, 10, 19], "tensorflow": [4, 5, 8, 9, 10, 13, 15, 18, 19], "tensorspec": 18, "term": 2, "test": [7, 17], "test_set": 7, "text": [1, 7, 8, 9, 11, 17], "text_output": 19, "textmatch": 11, "textnet": 9, "textnet_bas": 9, "textnet_smal": 9, "textnet_tini": 9, "textract": [5, 19], "textstylebrush": [5, 7], "textual": [5, 7, 8, 9, 19], "tf": [4, 8, 9, 10, 15, 18], "than": [3, 11, 15], "thank": 3, "thei": [2, 11], "them": [7, 19], "thi": [1, 2, 3, 4, 6, 7, 10, 11, 13, 14, 15, 17, 18, 19], "thing": [18, 19], "third": 4, "those": [2, 8, 19], "threaten": 2, "threshold": 19, "through": [2, 10, 16, 17], "tilman": 15, "time": [1, 2, 5, 9, 11, 17], "tini": 9, "titl": [8, 19], "tm": 19, "tmp": 14, "togeth": [3, 8], "tograi": 10, "tool": [1, 17], "top": [11, 18, 19], "topic": 3, "torch": [4, 10, 13, 15, 18], "torchvis": 10, "total": 13, "toward": [2, 4], "train": [3, 7, 9, 10, 15, 16, 17, 18, 19], "train_it": [7, 17], "train_load": [7, 17], "train_pytorch": 15, "train_set": [7, 17], "train_tensorflow": 15, "trainabl": [5, 9], "tranform": 10, "transcrib": 19, "transfer": [5, 7], "transfo": 10, "transform": [5, 7, 9], "translat": 2, "troll": 2, "true": [7, 8, 9, 10, 11, 13, 14, 15, 17, 18, 19], "truth": 11, "tune": [1, 18], "tupl": [7, 8, 10, 11], "two": [8, 14], "txt": 7, "type": [8, 11, 15, 18, 19], "typic": 19, "u": [2, 3], "ucsd": 7, "udac": 3, "uint8": [8, 9, 11, 19], "ukrainian": 7, "unaccept": 2, "underli": [17, 19], "underneath": 8, "understand": [5, 7, 19], "uniform": [9, 10], "uniformli": 10, "uninterrupt": [8, 19], "union": 11, "unit": 1, "unittest": 3, "unlock": 8, "unoffici": 9, "unprofession": 2, "unsolicit": 2, "unsupervis": 5, "unwelcom": 2, "up": [9, 19], "updat": 11, "upgrad": 3, "upper": [7, 10], "uppercas": 17, "url": 8, "us": [2, 3, 4, 7, 9, 11, 12, 13, 14, 15, 16, 19], "usabl": 19, "usag": [14, 18], "use_polygon": [7, 11, 17], "useabl": 19, "user": [5, 8, 12], "utf": 19, "util": 18, "v1": 15, "v3": [9, 15, 19], "valid": 17, "valu": [3, 8, 10, 19], "valuabl": 5, "variabl": 14, "varieti": 7, "veri": 9, "verma": 1, "version": [2, 3, 4, 18, 19], "vgg": 9, "vgg16": 15, "vgg16_bn_r": 9, "via": 2, "video": 1, "vietnames": 7, "view": [5, 7], "viewpoint": 2, "violat": 2, "visibl": 2, "vision": [5, 7, 9], "visiondataset": 7, "visiontransform": 9, "visual": [4, 5, 16], "visualize_pag": 11, "vit_": 9, "vit_b": 9, "vitstr": [5, 9, 18], "vitstr_bas": [9, 19], "vitstr_smal": [9, 13, 18, 19], "viz": 4, "vocab": [13, 15, 17, 18, 19], "vocabulari": [7, 13, 15], "w": [8, 9, 10, 11], "w3": 19, "wa": 2, "wai": [2, 5, 17], "want": [3, 18, 19], "warmup": 19, "wasn": 3, "we": [1, 2, 3, 4, 5, 8, 10, 13, 15, 17, 18, 19], "weasyprint": 8, "web": [3, 8], "websit": 7, "welcom": 2, "well": [1, 2, 18], "were": [2, 8, 19], "what": [1, 2], "when": [2, 3, 9], "whenev": 3, "where": [3, 8, 10, 11], "whether": [3, 7, 8, 10, 11, 17, 19], "which": [2, 9, 14, 16, 17, 19], "whichev": 4, "while": [10, 19], "why": 2, "width": [8, 10], "wiki": 2, "wildreceipt": [5, 7, 17], "window": [9, 11], "wish": 3, "within": 2, "without": [2, 7, 9], "wonder": 3, "word": [5, 7, 9, 11, 19], "word_1_1": 19, "word_1_2": 19, "word_1_3": 19, "wordgener": [7, 17], "words_onli": 11, "work": [1, 13, 14, 19], "workflow": 3, "worklow": 3, "world": [11, 19], "worth": 9, "wrap": 19, "wrapper": [7, 10], "write": 14, "written": [2, 8], "www": [2, 8, 19], "x": [8, 10, 11], "x_ascend": 19, "x_descend": 19, "x_i": 11, "x_size": 19, "x_wconf": 19, "xhtml": 19, "xmax": 8, "xmin": 8, "xml": 19, "xml_bytes_str": 19, "xml_element": 19, "xml_output": 19, "xmln": 19, "y": 11, "y_i": 11, "y_j": 11, "yet": 16, "ymax": 8, "ymin": 8, "yolov8": 16, "you": [3, 4, 7, 8, 9, 13, 14, 15, 16, 17, 18, 19], "your": [3, 5, 8, 11, 19], "yoursit": 8, "yugesh": 1, "zero": [10, 11], "zoo": 13, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7\u00e0\u00e2\u00e9\u00e8\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00e7": 7, "\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7\u00e0\u00e2\u00e9\u00e8\u00ea\u00eb\u00ee\u00ef\u00f4\u00f9\u00fb\u00fc\u00e7": 7, "\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa\u00e0\u00e8\u00e9\u00ec\u00ed\u00ee\u00f2\u00f3\u00f9\u00fa": 7, "\u00e1\u00e0\u00e2\u00e3\u00e9\u00ea\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7\u00e1\u00e0\u00e2\u00e3\u00e9\u00eb\u00ed\u00ef\u00f3\u00f4\u00f5\u00fa\u00fc\u00e7": 7, "\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5\u00e1\u00e0\u1ea3\u1ea1\u00e3\u0103\u1eaf\u1eb1\u1eb3\u1eb5\u1eb7\u00e2\u1ea5\u1ea7\u1ea9\u1eab\u1ead\u0111\u00e9\u00e8\u1ebb\u1ebd\u1eb9\u00ea\u1ebf\u1ec1\u1ec3\u1ec5\u1ec7\u00f3\u00f2\u1ecf\u00f5\u1ecd\u00f4\u1ed1\u1ed3\u1ed5\u1ed9\u1ed7\u01a1\u1edb\u1edd\u1edf\u1ee3\u1ee1\u00fa\u00f9\u1ee7\u0169\u1ee5\u01b0\u1ee9\u1eeb\u1eed\u1eef\u1ef1i\u00ed\u00ec\u1ec9\u0129\u1ecb\u00fd\u1ef3\u1ef7\u1ef9\u1ef5": 7, "\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1\u00e1\u00e9\u00ed\u00f3\u00fa\u00fc\u00f1": 7, "\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e\u00e1\u010d\u010f\u00e9\u011b\u00ed\u0148\u00f3\u0159\u0161\u0165\u00fa\u016f\u00fd\u017e": 7, "\u00e4\u00f6\u00e4\u00f6": 7, "\u00e4\u00f6\u00fc\u00df\u00e4\u00f6\u00fc\u00df": 7, "\u00e5\u00e4\u00f6\u00e5\u00e4\u00f6": 7, "\u00e6\u00f8\u00e5\u00e6\u00f8\u00e5": 7, "\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c\u0105\u0107\u0119\u0142\u0144\u00f3\u015b\u017a\u017c": 7, "\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9\u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03bf\u03c0\u03c1\u03c3\u03c4\u03c5\u03c6\u03c7\u03c8\u03c9": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f": 7, "\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f\u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u0439\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448\u0449\u044c\u044e\u044f0123456789": 7, "\u0491\u0456\u0457\u0454\u0491\u0456\u0457\u0454": 7, "\u05d0\u05d1\u05d2\u05d3\u05d4\u05d5\u05d6\u05d7\u05d8\u05d9\u05db\u05dc\u05de\u05e0\u05e1\u05e2\u05e4\u05e6\u05e7\u05e8\u05e9\u05ea": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a": 7, "\u0621\u0622\u0623\u0624\u0625\u0626\u0627\u0628\u0629\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637\u0638\u0639\u063a\u0640\u0641\u0642\u0643\u0644\u0645\u0646\u0647\u0648\u0649\u064a\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0660\u0661\u0662\u0663\u0664\u0665\u0666\u0667\u0668\u0669": 7, "\u067e\u0686\u06a2\u06a4\u06af": 7, "\u0905": 7, "\u0905\u0906\u0907\u0908\u0909\u090a\u090b\u0960\u090c\u0961\u090f\u0910\u0913\u0914\u0905": 7, "\u0915\u0916\u0917\u0918\u0919\u091a\u091b\u091c\u091d\u091e\u091f\u0920\u0921\u0922\u0923\u0924\u0925\u0926\u0927\u0928\u092a\u092b\u092c\u092d\u092e\u092f\u0930\u0932\u0935\u0936\u0937\u0938\u0939\u0966\u0967\u0968\u0969\u096a\u096b\u096c\u096d\u096e\u096f": 7, "\u0950": 7, "\u0985\u0986\u0987\u0988\u0989\u098a\u098b\u098f\u0990\u0993\u0994\u0995\u0996\u0997\u0998\u0999\u099a\u099b\u099c\u099d\u099e\u099f\u09a0\u09a1\u09a2\u09a3\u09a4\u09a5\u09a6\u09a7\u09a8\u09aa\u09ab\u09ac\u09ad\u09ae\u09af\u09b0\u09b2\u09b6\u09b7\u09b8\u09b9": 7, "\u09bd": 7, "\u09ce": 7, "\u09e6\u09e7\u09e8\u09e9\u09ea\u09eb\u09ec\u09ed\u09ee\u09ef": 7}, "titles": ["Changelog", "Community resources", "Contributor Covenant Code of Conduct", "Contributing to docTR", "Installation", "docTR: Document Text Recognition", "doctr.contrib", "doctr.datasets", "doctr.io", "doctr.models", "doctr.transforms", "doctr.utils", "docTR Notebooks", "Train your own model", "AWS Lambda", "Share your model with the community", "Integrate contributions into your pipeline", "Choose a ready to use dataset", "Preparing your model for inference", "Choosing the right model"], "titleterms": {"": 3, "0": 0, "01": 0, "02": 0, "03": 0, "04": 0, "05": 0, "07": 0, "08": 0, "09": 0, "1": [0, 2], "10": 0, "11": 0, "12": 0, "18": 0, "2": [0, 2], "2021": 0, "2022": 0, "2023": 0, "2024": 0, "21": 0, "22": 0, "27": 0, "28": 0, "29": 0, "3": [0, 2], "31": 0, "4": [0, 2], "5": 0, "6": 0, "7": 0, "8": 0, "9": 0, "advanc": 19, "approach": 19, "architectur": 19, "arg": [7, 8, 9, 10, 11], "artefact": 8, "artefactdetect": 16, "attribut": 2, "avail": [16, 17, 19], "aw": 14, "ban": 2, "block": 8, "bug": 3, "changelog": 0, "choos": [17, 19], "classif": [9, 13, 15], "code": [2, 3], "codebas": 3, "commit": 3, "commun": [1, 15], "compos": 10, "conda": 4, "conduct": 2, "connect": 3, "continu": 3, "contrib": 6, "contribut": [3, 6, 16], "contributor": 2, "convent": 15, "correct": 2, "coven": 2, "custom": [7, 13], "data": 17, "dataload": 7, "dataset": [5, 7, 17], "detect": [5, 9, 15, 17, 19], "develop": 3, "do": 19, "doctr": [3, 5, 6, 7, 8, 9, 10, 11, 12], "document": [3, 5, 8], "end": 19, "enforc": 2, "evalu": 11, "export": 18, "factori": 9, "featur": [3, 5], "feedback": 3, "file": 8, "from": 15, "gener": [7, 17], "git": 4, "guidelin": 2, "half": 18, "hub": 15, "huggingfac": 15, "i": 19, "infer": 18, "instal": [3, 4], "integr": [3, 16], "io": 8, "lambda": 14, "let": 3, "line": 8, "linux": 4, "load": [13, 15, 17], "loader": 7, "main": 5, "mode": 3, "model": [5, 9, 13, 15, 18, 19], "modifi": 3, "modul": [6, 16], "name": 15, "notebook": 12, "object": 17, "ocr": [17, 19], "onli": 4, "onnx": 18, "optim": 18, "option": 19, "orient": 13, "our": 2, "output": 19, "own": [13, 17], "packag": 4, "page": 8, "perman": 2, "pipelin": 16, "pledg": 2, "precis": 18, "predictor": 19, "prepar": 18, "prerequisit": 4, "pretrain": 15, "push": 15, "python": 4, "qualiti": 3, "question": 3, "read": 8, "readi": 17, "recognit": [5, 9, 15, 17, 19], "report": 3, "request": 3, "resourc": 1, "respons": 2, "return": [7, 8, 9, 11], "right": 19, "scope": 2, "share": 15, "should": 19, "stage": 19, "standard": 2, "structur": [3, 8], "style": 3, "support": [5, 6, 7, 10], "synthet": [7, 17], "task": 11, "temporari": 2, "test": 3, "text": [5, 19], "train": 13, "transform": 10, "two": 19, "unit": 3, "us": [17, 18], "util": 11, "v0": 0, "verif": 3, "via": 4, "visual": 11, "vocab": 7, "warn": 2, "what": 19, "word": 8, "your": [13, 15, 16, 17, 18], "zoo": [5, 9]}})
\ No newline at end of file
diff --git a/v0.1.1/using_doctr/custom_models_training.html b/v0.1.1/using_doctr/custom_models_training.html
index df39d8d568..b714c1f971 100644
--- a/v0.1.1/using_doctr/custom_models_training.html
+++ b/v0.1.1/using_doctr/custom_models_training.html
@@ -14,7 +14,7 @@
-
+
Train your own model - docTR documentation
@@ -619,7 +619,7 @@ Loading your custom trained orientation classification model
-
+
diff --git a/v0.1.1/using_doctr/running_on_aws.html b/v0.1.1/using_doctr/running_on_aws.html
index 16ceaca7a1..808ea541cd 100644
--- a/v0.1.1/using_doctr/running_on_aws.html
+++ b/v0.1.1/using_doctr/running_on_aws.html
@@ -14,7 +14,7 @@
-
+
AWS Lambda - docTR documentation
@@ -362,7 +362,7 @@ AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
AWS Lambda
-
+
diff --git a/v0.1.1/using_doctr/sharing_models.html b/v0.1.1/using_doctr/sharing_models.html
index d76b4017f4..c9e978400a 100644
--- a/v0.1.1/using_doctr/sharing_models.html
+++ b/v0.1.1/using_doctr/sharing_models.html
@@ -14,7 +14,7 @@
-
+
Share your model with the community - docTR documentation
@@ -544,7 +544,7 @@ Recognition
-
+
diff --git a/v0.1.1/using_doctr/using_contrib_modules.html b/v0.1.1/using_doctr/using_contrib_modules.html
index 50598dae5d..0c5fffdf9f 100644
--- a/v0.1.1/using_doctr/using_contrib_modules.html
+++ b/v0.1.1/using_doctr/using_contrib_modules.html
@@ -14,7 +14,7 @@
-
+
Integrate contributions into your pipeline - docTR documentation
@@ -415,7 +415,7 @@ ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
ArtefactDetection
-
+
diff --git a/v0.1.1/using_doctr/using_datasets.html b/v0.1.1/using_doctr/using_datasets.html
index 460476dbbf..8a7d4f0a64 100644
--- a/v0.1.1/using_doctr/using_datasets.html
+++ b/v0.1.1/using_doctr/using_datasets.html
@@ -14,7 +14,7 @@
-
+
Choose a ready to use dataset - docTR documentation
@@ -642,7 +642,7 @@ Data Loading
-
+
diff --git a/v0.1.1/using_doctr/using_model_export.html b/v0.1.1/using_doctr/using_model_export.html
index 6124c00ebe..6790dd0642 100644
--- a/v0.1.1/using_doctr/using_model_export.html
+++ b/v0.1.1/using_doctr/using_model_export.html
@@ -14,7 +14,7 @@
-
+
Preparing your model for inference - docTR documentation
@@ -467,7 +467,7 @@ Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
Using your ONNX exported model
-
+
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
index 61f1f5ab7a..9ead8498e1 100644
--- a/v0.1.1/using_doctr/using_models.html
+++ b/v0.1.1/using_doctr/using_models.html
@@ -14,7 +14,7 @@
-
+
Choosing the right model - docTR documentation
@@ -1253,7 +1253,7 @@ Advanced options
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/cord.html b/v0.2.0/_modules/doctr/datasets/cord.html
index de8018d676..55b0584830 100644
--- a/v0.2.0/_modules/doctr/datasets/cord.html
+++ b/v0.2.0/_modules/doctr/datasets/cord.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.cord - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -394,8 +461,8 @@ Source code for doctr.datasets.cord
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/core.html b/v0.2.0/_modules/doctr/datasets/core.html
deleted file mode 100644
index a1d2ee62ad..0000000000
--- a/v0.2.0/_modules/doctr/datasets/core.html
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets.core - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/datasets/detection.html b/v0.2.0/_modules/doctr/datasets/detection.html
index 739563e466..718001e4cf 100644
--- a/v0.2.0/_modules/doctr/datasets/detection.html
+++ b/v0.2.0/_modules/doctr/datasets/detection.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.detection - docTR documentation
@@ -430,7 +430,7 @@ Source code for doctr.datasets.detection
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
index 3313ae4660..94c32aaa0f 100644
--- a/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
+++ b/v0.2.0/_modules/doctr/datasets/doc_artefacts.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.doc_artefacts - docTR documentation
@@ -414,7 +414,7 @@ Source code for doctr.datasets.doc_artefacts
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/funsd.html b/v0.2.0/_modules/doctr/datasets/funsd.html
index f536b9282c..f08612f9fa 100644
--- a/v0.2.0/_modules/doctr/datasets/funsd.html
+++ b/v0.2.0/_modules/doctr/datasets/funsd.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.funsd - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
@@ -388,8 +453,8 @@ Source code for doctr.datasets.funsd
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
index 9f562582d9..a3e619f720 100644
--- a/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
+++ b/v0.2.0/_modules/doctr/datasets/generator/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.generator.tensorflow - docTR documentation
@@ -395,7 +395,7 @@ Source code for doctr.datasets.generator.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic03.html b/v0.2.0/_modules/doctr/datasets/ic03.html
index 3d221d07de..60e54a8a4b 100644
--- a/v0.2.0/_modules/doctr/datasets/ic03.html
+++ b/v0.2.0/_modules/doctr/datasets/ic03.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic03 - docTR documentation
@@ -468,7 +468,7 @@ Source code for doctr.datasets.ic03
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ic13.html b/v0.2.0/_modules/doctr/datasets/ic13.html
index 8137e08e9f..219c98dcd1 100644
--- a/v0.2.0/_modules/doctr/datasets/ic13.html
+++ b/v0.2.0/_modules/doctr/datasets/ic13.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ic13 - docTR documentation
@@ -440,7 +440,7 @@ Source code for doctr.datasets.ic13
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiit5k.html b/v0.2.0/_modules/doctr/datasets/iiit5k.html
index 1fc8ecfb27..b49c80fe18 100644
--- a/v0.2.0/_modules/doctr/datasets/iiit5k.html
+++ b/v0.2.0/_modules/doctr/datasets/iiit5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiit5k - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.datasets.iiit5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/iiithws.html b/v0.2.0/_modules/doctr/datasets/iiithws.html
index 07f5b13685..f7220afbc7 100644
--- a/v0.2.0/_modules/doctr/datasets/iiithws.html
+++ b/v0.2.0/_modules/doctr/datasets/iiithws.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.iiithws - docTR documentation
@@ -407,7 +407,7 @@ Source code for doctr.datasets.iiithws
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/imgur5k.html b/v0.2.0/_modules/doctr/datasets/imgur5k.html
index 68d433ca62..51c6545db4 100644
--- a/v0.2.0/_modules/doctr/datasets/imgur5k.html
+++ b/v0.2.0/_modules/doctr/datasets/imgur5k.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.imgur5k - docTR documentation
@@ -488,7 +488,7 @@ Source code for doctr.datasets.imgur5k
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/loader.html b/v0.2.0/_modules/doctr/datasets/loader.html
index 5108e3b731..ed80350ef0 100644
--- a/v0.2.0/_modules/doctr/datasets/loader.html
+++ b/v0.2.0/_modules/doctr/datasets/loader.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.loader - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
@@ -396,8 +444,8 @@ Source code for doctr.datasets.sroie
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/svhn.html b/v0.2.0/_modules/doctr/datasets/svhn.html
index 48e4e4d210..60e02b1b3b 100644
--- a/v0.2.0/_modules/doctr/datasets/svhn.html
+++ b/v0.2.0/_modules/doctr/datasets/svhn.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svhn - docTR documentation
@@ -473,7 +473,7 @@ Source code for doctr.datasets.svhn
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/svt.html b/v0.2.0/_modules/doctr/datasets/svt.html
index 4144dc6b9b..a997fcbb50 100644
--- a/v0.2.0/_modules/doctr/datasets/svt.html
+++ b/v0.2.0/_modules/doctr/datasets/svt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.svt - docTR documentation
@@ -459,7 +459,7 @@ Source code for doctr.datasets.svt
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/synthtext.html b/v0.2.0/_modules/doctr/datasets/synthtext.html
index 3b9de506a7..c776e1d673 100644
--- a/v0.2.0/_modules/doctr/datasets/synthtext.html
+++ b/v0.2.0/_modules/doctr/datasets/synthtext.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.synthtext - docTR documentation
@@ -470,7 +470,7 @@ Source code for doctr.datasets.synthtext
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/utils.html b/v0.2.0/_modules/doctr/datasets/utils.html
index aedf276e89..bde9304597 100644
--- a/v0.2.0/_modules/doctr/datasets/utils.html
+++ b/v0.2.0/_modules/doctr/datasets/utils.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.utils - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
@@ -421,8 +553,8 @@ Source code for doctr.datasets.utils
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/wildreceipt.html b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
index c543ee7cac..12c6aebd14 100644
--- a/v0.2.0/_modules/doctr/datasets/wildreceipt.html
+++ b/v0.2.0/_modules/doctr/datasets/wildreceipt.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.wildreceipt - docTR documentation
@@ -454,7 +454,7 @@ Source code for doctr.datasets.wildreceipt
-
+
diff --git a/v0.2.0/_modules/doctr/documents/elements.html b/v0.2.0/_modules/doctr/documents/elements.html
deleted file mode 100644
index df3a989d4a..0000000000
--- a/v0.2.0/_modules/doctr/documents/elements.html
+++ /dev/null
@@ -1,550 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.elements - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/documents/reader.html b/v0.2.0/_modules/doctr/documents/reader.html
deleted file mode 100644
index 43865531a4..0000000000
--- a/v0.2.0/_modules/doctr/documents/reader.html
+++ /dev/null
@@ -1,606 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents.reader - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/io/elements.html b/v0.2.0/_modules/doctr/io/elements.html
index 753a47455c..e049d6ce30 100644
--- a/v0.2.0/_modules/doctr/io/elements.html
+++ b/v0.2.0/_modules/doctr/io/elements.html
@@ -13,7 +13,7 @@
-
+
doctr.io.elements - docTR documentation
@@ -1008,7 +1008,7 @@ Source code for doctr.io.elements
-
+
diff --git a/v0.2.0/_modules/doctr/io/html.html b/v0.2.0/_modules/doctr/io/html.html
index 7ad5b97031..be73631500 100644
--- a/v0.2.0/_modules/doctr/io/html.html
+++ b/v0.2.0/_modules/doctr/io/html.html
@@ -13,7 +13,7 @@
-
+
doctr.io.html - docTR documentation
@@ -360,7 +360,7 @@ Source code for doctr.io.html
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/base.html b/v0.2.0/_modules/doctr/io/image/base.html
index 336b4bff0e..a50c95d595 100644
--- a/v0.2.0/_modules/doctr/io/image/base.html
+++ b/v0.2.0/_modules/doctr/io/image/base.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.base - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.io.image.base
-
+
diff --git a/v0.2.0/_modules/doctr/io/image/tensorflow.html b/v0.2.0/_modules/doctr/io/image/tensorflow.html
index f1846820a3..3b9e731756 100644
--- a/v0.2.0/_modules/doctr/io/image/tensorflow.html
+++ b/v0.2.0/_modules/doctr/io/image/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.io.image.tensorflow - docTR documentation
@@ -445,7 +445,7 @@ Source code for doctr.io.image.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/io/pdf.html b/v0.2.0/_modules/doctr/io/pdf.html
index e3abf6960b..e5b94811c3 100644
--- a/v0.2.0/_modules/doctr/io/pdf.html
+++ b/v0.2.0/_modules/doctr/io/pdf.html
@@ -13,7 +13,7 @@
-
+
doctr.io.pdf - docTR documentation
@@ -377,7 +377,7 @@ Source code for doctr.io.pdf
-
+
diff --git a/v0.2.0/_modules/doctr/io/reader.html b/v0.2.0/_modules/doctr/io/reader.html
index c1ddc26edd..d36e5bb553 100644
--- a/v0.2.0/_modules/doctr/io/reader.html
+++ b/v0.2.0/_modules/doctr/io/reader.html
@@ -13,7 +13,7 @@
-
+
doctr.io.reader - docTR documentation
@@ -426,7 +426,7 @@ Source code for doctr.io.reader
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
index 9f074805c1..61a010d548 100644
--- a/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/magc_resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.magc_resnet.tensorflow - docTR documentation
@@ -531,7 +531,7 @@ Source code for doctr.models.classification.magc_resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
index 6a63851276..7c448394ad 100644
--- a/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/mobilenet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.mobilenet.tensorflow - docTR documentation
@@ -793,7 +793,7 @@ Source code for doctr.models.classification.mobilenet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
index 095d377f31..aed4343741 100644
--- a/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/resnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.resnet.tensorflow - docTR documentation
@@ -749,7 +749,7 @@ Source code for doctr.models.classification.resnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
index ad254ebbfb..c5567d7d67 100644
--- a/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/textnet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.textnet.tensorflow - docTR documentation
@@ -611,7 +611,7 @@ Source code for doctr.models.classification.textnet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
index 01ae452624..788111ae87 100644
--- a/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vgg/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vgg.tensorflow - docTR documentation
@@ -451,7 +451,7 @@ Source code for doctr.models.classification.vgg.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
index 1333cf6045..971ba5abe9 100644
--- a/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/classification/vit/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.vit.tensorflow - docTR documentation
@@ -533,7 +533,7 @@ Source code for doctr.models.classification.vit.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/classification/zoo.html b/v0.2.0/_modules/doctr/models/classification/zoo.html
index f7796a7522..3eb2a3ec4e 100644
--- a/v0.2.0/_modules/doctr/models/classification/zoo.html
+++ b/v0.2.0/_modules/doctr/models/classification/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.classification.zoo - docTR documentation
@@ -447,7 +447,7 @@ Source code for doctr.models.classification.zoo
<
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
deleted file mode 100644
index aef0023c40..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization.html
+++ /dev/null
@@ -1,876 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.differentiable_binarization - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
index 4325d0b74a..66cef8663d 100644
--- a/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/differentiable_binarization/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.differentiable_binarization.tensorflow - docTR documentation
@@ -759,7 +759,7 @@ Source code for doctr.models.detection.differentiable_binarization.tensorflo
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
index 5b84d2dea1..65e1a77af8 100644
--- a/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/fast/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.fast.tensorflow - docTR documentation
@@ -769,7 +769,7 @@ Source code for doctr.models.detection.fast.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet.html b/v0.2.0/_modules/doctr/models/detection/linknet.html
deleted file mode 100644
index 42db111bb3..0000000000
--- a/v0.2.0/_modules/doctr/models/detection/linknet.html
+++ /dev/null
@@ -1,637 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.detection.linknet - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
index dbb58e37cf..ce995f99d4 100644
--- a/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/detection/linknet/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.linknet.tensorflow - docTR documentation
@@ -716,7 +716,7 @@ Source code for doctr.models.detection.linknet.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/detection/zoo.html b/v0.2.0/_modules/doctr/models/detection/zoo.html
index 55630ebacb..3651c4e2d3 100644
--- a/v0.2.0/_modules/doctr/models/detection/zoo.html
+++ b/v0.2.0/_modules/doctr/models/detection/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.detection.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
@@ -354,8 +449,8 @@ Source code for doctr.models.detection.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/export.html b/v0.2.0/_modules/doctr/models/export.html
deleted file mode 100644
index f25a81aa21..0000000000
--- a/v0.2.0/_modules/doctr/models/export.html
+++ /dev/null
@@ -1,411 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.export - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/factory/hub.html b/v0.2.0/_modules/doctr/models/factory/hub.html
index 8274a809f5..756b2c7a17 100644
--- a/v0.2.0/_modules/doctr/models/factory/hub.html
+++ b/v0.2.0/_modules/doctr/models/factory/hub.html
@@ -13,7 +13,7 @@
-
+
doctr.models.factory.hub - docTR documentation
@@ -568,7 +568,7 @@ Source code for doctr.models.factory.hub
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn.html b/v0.2.0/_modules/doctr/models/recognition/crnn.html
deleted file mode 100644
index db8bbc2c27..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/crnn.html
+++ /dev/null
@@ -1,579 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.crnn - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
index e50c245923..bc64da9a1b 100644
--- a/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/crnn/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.crnn.tensorflow - docTR documentation
@@ -658,7 +658,7 @@ Source code for doctr.models.recognition.crnn.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
index 152ebb7e59..aa6aa69325 100644
--- a/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/master/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.master.tensorflow - docTR documentation
@@ -655,7 +655,7 @@ Source code for doctr.models.recognition.master.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
index 0819737dfc..b181acef53 100644
--- a/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/parseq/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.parseq.tensorflow - docTR documentation
@@ -845,7 +845,7 @@ Source code for doctr.models.recognition.parseq.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar.html b/v0.2.0/_modules/doctr/models/recognition/sar.html
deleted file mode 100644
index 7b3a3e74b1..0000000000
--- a/v0.2.0/_modules/doctr/models/recognition/sar.html
+++ /dev/null
@@ -1,709 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.models.recognition.sar - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
index 010bc2bc54..4a591e6451 100644
--- a/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/sar/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.sar.tensorflow - docTR documentation
@@ -757,7 +757,7 @@ Source code for doctr.models.recognition.sar.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
index 6e101893bf..c594d40a56 100644
--- a/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
+++ b/v0.2.0/_modules/doctr/models/recognition/vitstr/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.vitstr.tensorflow - docTR documentation
@@ -621,7 +621,7 @@ Source code for doctr.models.recognition.vitstr.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/models/recognition/zoo.html b/v0.2.0/_modules/doctr/models/recognition/zoo.html
index a4d43d1801..f664304019 100644
--- a/v0.2.0/_modules/doctr/models/recognition/zoo.html
+++ b/v0.2.0/_modules/doctr/models/recognition/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.recognition.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
@@ -354,8 +414,8 @@ Source code for doctr.models.recognition.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/models/zoo.html b/v0.2.0/_modules/doctr/models/zoo.html
index dec6857019..d459671648 100644
--- a/v0.2.0/_modules/doctr/models/zoo.html
+++ b/v0.2.0/_modules/doctr/models/zoo.html
@@ -13,7 +13,7 @@
-
+
doctr.models.zoo - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
+def _kie_predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> KIEPredictor:
+ # Detection
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
+
+ # Recognition
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
+
+ return KIEPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
+
+[docs]
+def kie_predictor(
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
+) -> KIEPredictor:
+ """End-to-end KIE architecture using one model for localization, and another for text recognition.
+
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
+
+ Args:
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
+ pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
+
+ Returns:
+ -------
+ KIE predictor
+ """
+ return _kie_predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
@@ -353,8 +575,8 @@ Source code for doctr.models.zoo
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules.html b/v0.2.0/_modules/doctr/transforms/modules.html
deleted file mode 100644
index 214233e166..0000000000
--- a/v0.2.0/_modules/doctr/transforms/modules.html
+++ /dev/null
@@ -1,716 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
- doctr.transforms.modules - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
- Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/_modules/doctr/transforms/modules/base.html b/v0.2.0/_modules/doctr/transforms/modules/base.html
index 96ebd680b7..4596df3848 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/base.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/base.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.base - docTR documentation
@@ -643,7 +643,7 @@ Source code for doctr.transforms.modules.base
-
+
diff --git a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
index 0e18bcc922..acbbe96225 100644
--- a/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
+++ b/v0.2.0/_modules/doctr/transforms/modules/tensorflow.html
@@ -13,7 +13,7 @@
-
+
doctr.transforms.modules.tensorflow - docTR documentation
@@ -956,7 +956,7 @@ Source code for doctr.transforms.modules.tensorflow
-
+
diff --git a/v0.2.0/_modules/doctr/utils/metrics.html b/v0.2.0/_modules/doctr/utils/metrics.html
index afd16328c6..8a37d5949a 100644
--- a/v0.2.0/_modules/doctr/utils/metrics.html
+++ b/v0.2.0/_modules/doctr/utils/metrics.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.metrics - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/utils/visualization.html b/v0.2.0/_modules/doctr/utils/visualization.html
index 3e5bc073f8..c818be6d7b 100644
--- a/v0.2.0/_modules/doctr/utils/visualization.html
+++ b/v0.2.0/_modules/doctr/utils/visualization.html
@@ -13,7 +13,7 @@
-
+
doctr.utils.visualization - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
-
+
+
diff --git a/v0.2.0/_modules/index.html b/v0.2.0/_modules/index.html
index dc72311281..5793c44f20 100644
--- a/v0.2.0/_modules/index.html
+++ b/v0.2.0/_modules/index.html
@@ -13,7 +13,7 @@
-
+
Overview: module code - docTR documentation
@@ -225,15 +225,42 @@
-
-
+
+
diff --git a/v0.2.0/_sources/datasets.rst.txt b/v0.2.0/_sources/datasets.rst.txt
deleted file mode 100644
index d2080bc034..0000000000
--- a/v0.2.0/_sources/datasets.rst.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-doctr.datasets
-==============
-
-.. currentmodule:: doctr.datasets
-
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-
-.. _datasets:
-
-Available Datasets
-------------------
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
-.. autoclass:: doctr.datasets.core.VisionDataset
-
-
-Here are all datasets that are available through DocTR:
-
-.. autoclass:: FUNSD
-.. autoclass:: SROIE
-.. autoclass:: CORD
-..autoclass:: OCRDataset
-
-
-Data Loading
-------------
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
-.. autoclass:: doctr.datasets.loader.DataLoader
-
-
-.. _vocabs:
-
-Supported Vocabs
-----------------
-
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-.. list-table:: DocTR Vocabs
- :widths: 20 5 50
- :header-rows: 1
-
- * - Name
- - size
- - characters
- * - digits
- - 10
- - 0123456789
- * - ascii_letters
- - 52
- - abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
- * - punctuation
- - 32
- - !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
- * - currency
- - 5
- - £€¥¢฿
- * - latin
- - 96
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°
- * - french
- - 154
- - 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-.. autofunction:: encode_sequences
diff --git a/v0.2.0/_sources/documents.rst.txt b/v0.2.0/_sources/documents.rst.txt
deleted file mode 100644
index e2fa11b344..0000000000
--- a/v0.2.0/_sources/documents.rst.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-doctr.documents
-===============
-
-
-.. currentmodule:: doctr.documents
-
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-
-Document structure
-------------------
-
-Structural organization of the documents.
-
-Word
-^^^^
-A Word is an uninterrupted sequence of characters.
-
-.. autoclass:: Word
-
-Line
-^^^^
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
-.. autoclass:: Line
-
-Artefact
-^^^^^^^^
-
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
-.. autoclass:: Artefact
-
-Block
-^^^^^
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
-.. autoclass:: Block
-
-Page
-^^^^
-
-A Page is a collection of Blocks that were on the same physical page.
-
-.. autoclass:: Page
-
-
-Document
-^^^^^^^^
-
-A Document is a collection of Pages.
-
-.. autoclass:: Document
-
-
-File reading
-------------
-
-High-performance file reading and conversion to processable structured data.
-
-.. autofunction:: read_pdf
-
-.. autofunction:: read_img
-
-.. autofunction:: read_html
-
-
-.. autoclass:: DocumentFile
-
- .. automethod:: from_pdf
-
- .. automethod:: from_url
-
- .. automethod:: from_images
-
-.. autoclass:: PDF
-
- .. automethod:: as_images
-
- .. automethod:: get_words
-
- .. automethod:: get_artefacts
diff --git a/v0.2.0/_sources/getting_started/installing.rst.txt b/v0.2.0/_sources/getting_started/installing.rst.txt
index e764e734a7..39e79aa3dd 100644
--- a/v0.2.0/_sources/getting_started/installing.rst.txt
+++ b/v0.2.0/_sources/getting_started/installing.rst.txt
@@ -3,7 +3,7 @@
Installation
************
-This library requires `Python `_ 3.9 or higher.
+This library requires `Python `_ 3.10 or higher.
Prerequisites
diff --git a/v0.2.0/_sources/index.rst.txt b/v0.2.0/_sources/index.rst.txt
index a7d5ef909e..53251db142 100644
--- a/v0.2.0/_sources/index.rst.txt
+++ b/v0.2.0/_sources/index.rst.txt
@@ -1,75 +1,122 @@
-DocTR: Document Text Recognition
-================================
+********************************
+docTR: Document Text Recognition
+********************************
+
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
+.. image:: https://github.com/mindee/doctr/releases/download/v0.2.0/ocr.png
+ :align: center
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
DocTR provides an easy and powerful way to extract valuable information from your documents:
-* |:receipt:| **for automation**: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+* |:receipt:| **for automation**: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
* |:woman_scientist:| **for research**: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository `doctr `_.
+Main Features
+-------------
-Features
---------
-
-* |:robot:| Robust 2-stages (detection + recognition) OCR predictors fully trained
+* |:robot:| Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
* |:zap:| User-friendly, 3 lines of code to load a document and extract text with a predictor
-* |:rocket:| State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-* |:zap:| Predictors optimized to be very fast on both CPU & GPU
-* |:bird:| Light package, small dependencies
-* |:tools:| Daily maintained
-* |:factory:| Easily integrable
-
+* |:rocket:| State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+* |:zap:| Optimized for inference speed on both CPU & GPU
+* |:bird:| Light package, minimal dependencies
+* |:tools:| Actively maintained by Mindee
+* |:factory:| Easy integration (available templates for browser demo & API deployment)
-|:scientist:| Build & train your predictor
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-* |:construction_worker:| Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-* |:construction_worker:| Fine-tune or train from scratch any detection or recognition model to specialize on your data
+.. toctree::
+ :maxdepth: 2
+ :caption: Getting started
+ :hidden:
+
+ getting_started/installing
+ notebooks
+
+
+Model zoo
+^^^^^^^^^
+
+Text detection models
+"""""""""""""""""""""
+* DBNet from `"Real-time Scene Text Detection with Differentiable Binarization" `_
+* LinkNet from `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_
+* FAST from `"FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation" `_
+
+Text recognition models
+"""""""""""""""""""""""
+* SAR from `"Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition" `_
+* CRNN from `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_
+* MASTER from `"MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" `_
+* ViTSTR from `"Vision Transformer for Fast and Efficient Scene Text Recognition" `_
+* PARSeq from `"Scene Text Recognition with Permuted Autoregressive Sequence Models" `_
+
+
+Supported datasets
+^^^^^^^^^^^^^^^^^^
+* FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
+* CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+* SROIE from `ICDAR 2019 `_.
+* IIIT-5k from `CVIT `_.
+* Street View Text from `"End-to-End Scene Text Recognition" `_.
+* SynthText from `Visual Geometry Group `_.
+* SVHN from `"Reading Digits in Natural Images with Unsupervised Feature Learning" `_.
+* IC03 from `ICDAR 2003 `_.
+* IC13 from `ICDAR 2013 `_.
+* IMGUR5K from `"TextStyleBrush: Transfer of Text Aesthetics from a Single Example" `_.
+* MJSynth from `"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition" `_.
+* IIITHWS from `"Generating Synthetic Data for Text Recognition" `_.
+* WILDRECEIPT from `"Spatial Dual-Modality Graph Reasoning for Key Information Extraction" `_.
-|:toolbox:| Implemented models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Detection models
-""""""""""""""""
- * DB (Differentiable Binarization), `"Real-time Scene Text Detection with Differentiable Binarization" `_.
- * LinkNet, `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Using docTR
+ :hidden:
-Recognition models
-""""""""""""""""""
- * SAR (Show, Attend and Read), `"Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition" `_.
- * CRNN (Convolutional Recurrent Neural Network), `"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition" `_.
+ using_doctr/using_models
+ using_doctr/using_datasets
+ using_doctr/using_contrib_modules
+ using_doctr/sharing_models
+ using_doctr/using_model_export
+ using_doctr/custom_models_training
+ using_doctr/running_on_aws
-|:receipt:| Integrated datasets
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * FUNSD from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents" `_.
- * CORD from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing" `_.
+.. toctree::
+ :maxdepth: 2
+ :caption: Community
+ :hidden:
+ community/resources
-Getting Started
----------------
.. toctree::
:maxdepth: 2
+ :caption: Package Reference
+ :hidden:
- installing
+ modules/contrib
+ modules/datasets
+ modules/io
+ modules/models
+ modules/transforms
+ modules/utils
-Contents
---------
-
.. toctree::
- :maxdepth: 1
+ :maxdepth: 2
+ :caption: Contributing
+ :hidden:
- datasets
- documents
- models
- transforms
- utils
+ contributing/code_of_conduct
+ contributing/contributing
-.. automodule:: doctr
- :members:
+.. toctree::
+ :maxdepth: 2
+ :caption: Notes
+ :hidden:
+
+ changelog
diff --git a/v0.2.0/_sources/installing.rst.txt b/v0.2.0/_sources/installing.rst.txt
deleted file mode 100644
index ee7de4dbc0..0000000000
--- a/v0.2.0/_sources/installing.rst.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-
-************
-Installation
-************
-
-This library requires Python 3.6 or newer.
-
-Via Python Package
-==================
-
-Install the last stable release of the package using pip:
-
-.. code:: bash
-
- pip install python-doctr
-
-
-Via Git
-=======
-
-Install the library in developper mode:
-
-.. code:: bash
-
- git clone https://github.com/mindee/doctr.git
- pip install -e doctr/.
diff --git a/v0.2.0/_sources/models.rst.txt b/v0.2.0/_sources/models.rst.txt
deleted file mode 100644
index 410e9604f7..0000000000
--- a/v0.2.0/_sources/models.rst.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-doctr.models
-============
-
-The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
-Either performed at once or separately, to each task corresponds a type of deep learning architecture.
-
-.. currentmodule:: doctr.models
-
-For a given task, DocTR provides a Predictor, which is composed of 3 components:
-
-* PreProcessor: a module in charge of making inputs directly usable by the TensorFlow model.
-* Model: a deep learning model, implemented with TensorFlow backend.
-* PostProcessor: making model outputs structured and reusable.
-
-
-Text Detection
---------------
-Localizing text elements in images
-
-+---------------------------------------------------+----------------------------+----------------------------+---------+
-| | FUNSD | CORD | |
-+==================+=================+==============+============+===============+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **Recall** | **Precision** | **FPS** |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-| db_resnet50 | (1024, 1024, 3) | | 0.733 | 0.817 | 0.745 | 0.875 | 2.1 |
-+------------------+-----------------+--------------+------------+---------------+------------+---------------+---------+
-
-All text detection models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 1024, 1024, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 1024, 1024, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for detection
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for detection is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) with potential deformation.
-2. batch images together
-3. normalize the batch using the training data statistics
-
-
-Detection models
-^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-.. autofunction:: doctr.models.detection.db_resnet50
-.. autofunction:: doctr.models.detection.linknet
-
-
-Post-processing detections
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (binary segmentation map for instance), into a set of bounding boxes.
-
-
-Detection predictors
-^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage, predictors lets you pass numpy images as inputs and return structured information.
-
-.. autofunction:: doctr.models.detection.detection_predictor
-
-
-Text Recognition
-----------------
-Identifying strings in images
-
-.. list-table:: Text recognition model zoo
- :widths: 20 20 15 10 10 10
- :header-rows: 1
-
- * - Architecture
- - Input shape
- - # params
- - FUNSD
- - CORD
- - FPS
- * - crnn_vgg16_bn
- - (32, 128, 3)
- -
- - 0.860
- - 0.913
- - 12.8
- * - sar_vgg16_bn
- - (32, 128, 3)
- -
- - 0.862
- - 0.917
- - 3.3
- * - sar_resnet31
- - (32, 128, 3)
- -
- - **0.863**
- - **0.921**
- - 2.7
-
-All text recognition models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All these recognition models are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the model, we feed the model with 100 random tensors of shape [1, 32, 128, 3] as a warm-up. Then, we measure the average speed of the model on 1000 batches of 1 frame (random tensors of shape [1, 32, 128, 3]).
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Pre-processing for recognition
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-In DocTR, the pre-processing scheme for recognition is the following:
-
-1. resize each input image to the target size (bilinear interpolation by default) without deformation.
-2. pad the image to the target size (with zeros by default)
-3. batch images together
-4. normalize the batch using the training data statistics
-
-Recognition models
-^^^^^^^^^^^^^^^^^^
-Models expect a TensorFlow tensor as input and produces one in return. DocTR includes implementations and pretrained versions of the following models:
-
-
-.. autofunction:: doctr.models.recognition.crnn_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_vgg16_bn
-.. autofunction:: doctr.models.recognition.sar_resnet31
-
-Post-processing outputs
-^^^^^^^^^^^^^^^^^^^^^^^
-The purpose of this block is to turn the model output (symbol classification for the sequence), into a set of strings.
-
-Recognition predictors
-^^^^^^^^^^^^^^^^^^^^^^
-Combining the right components around a given architecture for easier usage.
-
-.. autofunction:: doctr.models.recognition.recognition_predictor
-
-
-End-to-End OCR
---------------
-Predictors that localize and identify text elements in images
-
-+--------------------------------------------------------------+--------------------------------------+--------------------------------------+
-| | FUNSD | CORD |
-+=============================+=================+==============+============+===============+=========+============+===============+=========+
-| **Architecture** | **Input shape** | **# params** | **Recall** | **Precision** | **FPS** | **Recall** | **Precision** | **FPS** |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + crnn_vgg16_bn | (1024, 1024, 3) | | 0.629 | 0.701 | 0.85 | 0.664 | 0.780 | 1.6 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_vgg16_bn | (1024, 1024, 3) | | 0.630 | 0.702 | 0.49 | 0.666 | 0.783 | 1.0 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| db_resnet50 + sar_resnet31 | (1024, 1024, 3) | | 0.640 | 0.713 | 0.27 | 0.672 | **0.789** | 0.83 |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision text detection | NA | | 0.595 | 0.625 | | 0.753 | 0.700 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| Gvision doc. text detection | NA | | 0.640 | 0.533 | | 0.689 | 0.611 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-| aws textract | NA | | **0.781** | **0.830** | | **0.875** | 0.660 | |
-+-----------------------------+-----------------+--------------+------------+---------------+---------+------------+---------------+---------+
-
-All OCR models above have been evaluated using both the training and evaluation sets of FUNSD and CORD (cf. :ref:`datasets`).
-Explanations about the metrics being used are available in :ref:`metrics`.
-
-All recognition models of predictors are trained with our french vocab (cf. :ref:`vocabs`).
-
-*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*
-
-FPS (Frames per second) is computed this way: we instantiate the predictor, we warm-up the model and then we measure the average speed of the end-to-end predictor on the datasets, with a batch size of 1.
-We used a c5.x12large from AWS instances (CPU Xeon Platinum 8275L) to perform experiments.
-
-Two-stage approaches
-^^^^^^^^^^^^^^^^^^^^
-Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block.
-
-.. autofunction:: doctr.models.zoo.ocr_predictor
-
-
-Model export
-------------
-Utility functions to make the most of document analysis models.
-
-.. currentmodule:: doctr.models.export
-
-Model compression
-^^^^^^^^^^^^^^^^^
-
-.. autofunction:: convert_to_tflite
-
-.. autofunction:: convert_to_fp16
-
-.. autofunction:: quantize_model
-
-Using SavedModel
-^^^^^^^^^^^^^^^^
-
-Additionally, models in DocTR inherit TensorFlow 2 model properties and can be exported to
-`SavedModel `_ format as follows:
-
-
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_t = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> _ = model(input_t, training=False)
- >>> tf.saved_model.save(model, 'path/to/your/folder/db_resnet50/')
-
-And loaded just as easily:
-
-
- >>> import tensorflow as tf
- >>> model = tf.saved_model.load('path/to/your/folder/db_resnet50/')
diff --git a/v0.2.0/_sources/transforms.rst.txt b/v0.2.0/_sources/transforms.rst.txt
deleted file mode 100644
index 0230fe75f5..0000000000
--- a/v0.2.0/_sources/transforms.rst.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-doctr.transforms
-================
-
-.. currentmodule:: doctr.transforms
-
-Data transformations are part of both training and inference procedure. Drawing inspiration from the design of `torchvision `_, we express transformations as composable modules.
-
-
-Supported transformations
--------------------------
-Here are all transformations that are available through DocTR:
-
-.. autoclass:: Resize
-.. autoclass:: Normalize
-.. autoclass:: LambdaTransformation
-.. autoclass:: ToGray
-.. autoclass:: ColorInversion
-.. autoclass:: RandomBrightness
-.. autoclass:: RandomContrast
-.. autoclass:: RandomSaturation
-.. autoclass:: RandomHue
-.. autoclass:: RandomGamma
-.. autoclass:: RandomJpegQuality
-
-
-Composing transformations
----------------------------------------------
-It is common to require several transformations to be performed consecutively.
-
-.. autoclass:: Compose
-.. autoclass:: OneOf
-.. autoclass:: RandomApply
diff --git a/v0.2.0/_sources/utils.rst.txt b/v0.2.0/_sources/utils.rst.txt
deleted file mode 100644
index 1a02858378..0000000000
--- a/v0.2.0/_sources/utils.rst.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-doctr.utils
-===========
-
-This module regroups non-core features that are complementary to the rest of the package.
-
-.. currentmodule:: doctr.utils
-
-
-Visualization
--------------
-Easy-to-use functions to make sense of your model's predictions.
-
-.. currentmodule:: doctr.utils.visualization
-
-.. autofunction:: visualize_page
-
-
-.. _metrics:
-
-Task evaluation
----------------
-Implementations of task-specific metrics to easily assess your model performances.
-
-.. currentmodule:: doctr.utils.metrics
-
-.. autoclass:: ExactMatch
-
-.. autoclass:: LocalizationConfusion
-
-.. autoclass:: OCRMetric
diff --git a/v0.2.0/_static/basic.css b/v0.2.0/_static/basic.css
index f316efcb47..7ebbd6d07b 100644
--- a/v0.2.0/_static/basic.css
+++ b/v0.2.0/_static/basic.css
@@ -1,12 +1,5 @@
/*
- * basic.css
- * ~~~~~~~~~
- *
* Sphinx stylesheet -- basic theme.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
/* -- main layout ----------------------------------------------------------- */
@@ -115,15 +108,11 @@ img {
/* -- search page ----------------------------------------------------------- */
ul.search {
- margin: 10px 0 0 20px;
- padding: 0;
+ margin-top: 10px;
}
ul.search li {
- padding: 5px 0 5px 20px;
- background-image: url(file.png);
- background-repeat: no-repeat;
- background-position: 0 7px;
+ padding: 5px 0;
}
ul.search li a {
diff --git a/v0.2.0/_static/doctools.js b/v0.2.0/_static/doctools.js
index 4d67807d17..0398ebb9f0 100644
--- a/v0.2.0/_static/doctools.js
+++ b/v0.2.0/_static/doctools.js
@@ -1,12 +1,5 @@
/*
- * doctools.js
- * ~~~~~~~~~~~
- *
* Base JavaScript utilities for all Sphinx HTML documentation.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
diff --git a/v0.2.0/_static/documentation_options.js b/v0.2.0/_static/documentation_options.js
index 40b838b240..4f656fdbea 100644
--- a/v0.2.0/_static/documentation_options.js
+++ b/v0.2.0/_static/documentation_options.js
@@ -1,5 +1,5 @@
const DOCUMENTATION_OPTIONS = {
- VERSION: '0.1.2a0-git',
+ VERSION: '0.10.1a0-git',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
diff --git a/v0.2.0/_static/language_data.js b/v0.2.0/_static/language_data.js
index 367b8ed81b..c7fe6c6faf 100644
--- a/v0.2.0/_static/language_data.js
+++ b/v0.2.0/_static/language_data.js
@@ -1,13 +1,6 @@
/*
- * language_data.js
- * ~~~~~~~~~~~~~~~~
- *
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
diff --git a/v0.2.0/_static/searchtools.js b/v0.2.0/_static/searchtools.js
index b08d58c9b9..2c774d17af 100644
--- a/v0.2.0/_static/searchtools.js
+++ b/v0.2.0/_static/searchtools.js
@@ -1,12 +1,5 @@
/*
- * searchtools.js
- * ~~~~~~~~~~~~~~~~
- *
* Sphinx JavaScript utilities for the full-text search.
- *
- * :copyright: Copyright 2007-2024 by the Sphinx team, see AUTHORS.
- * :license: BSD, see LICENSE for details.
- *
*/
"use strict";
@@ -20,7 +13,7 @@ if (typeof Scorer === "undefined") {
// and returns the new score.
/*
score: result => {
- const [docname, title, anchor, descr, score, filename] = result
+ const [docname, title, anchor, descr, score, filename, kind] = result
return score
},
*/
@@ -47,6 +40,14 @@ if (typeof Scorer === "undefined") {
};
}
+// Global search result kind enum, used by themes to style search results.
+class SearchResultKind {
+ static get index() { return "index"; }
+ static get object() { return "object"; }
+ static get text() { return "text"; }
+ static get title() { return "title"; }
+}
+
const _removeChildren = (element) => {
while (element && element.lastChild) element.removeChild(element.lastChild);
};
@@ -64,9 +65,13 @@ const _displayItem = (item, searchTerms, highlightTerms) => {
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
const contentRoot = document.documentElement.dataset.content_root;
- const [docName, title, anchor, descr, score, _filename] = item;
+ const [docName, title, anchor, descr, score, _filename, kind] = item;
let listItem = document.createElement("li");
+ // Add a class representing the item's type:
+ // can be used by a theme's CSS selector for styling
+ // See SearchResultKind for the class names.
+ listItem.classList.add(`kind-${kind}`);
let requestUrl;
let linkUrl;
if (docBuilder === "dirhtml") {
@@ -115,8 +120,10 @@ const _finishSearch = (resultCount) => {
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
);
else
- Search.status.innerText = _(
- "Search finished, found ${resultCount} page(s) matching the search query."
+ Search.status.innerText = Documentation.ngettext(
+ "Search finished, found one page matching the search query.",
+ "Search finished, found ${resultCount} pages matching the search query.",
+ resultCount,
).replace('${resultCount}', resultCount);
};
const _displayNextItem = (
@@ -138,7 +145,7 @@ const _displayNextItem = (
else _finishSearch(resultCount);
};
// Helper function used by query() to order search results.
-// Each input is an array of [docname, title, anchor, descr, score, filename].
+// Each input is an array of [docname, title, anchor, descr, score, filename, kind].
// Order the results by score (in opposite order of appearance, since the
// `_displayNextItem` function uses pop() to retrieve items) and then alphabetically.
const _orderResultsByScoreThenName = (a, b) => {
@@ -248,6 +255,7 @@ const Search = {
searchSummary.classList.add("search-summary");
searchSummary.innerText = "";
const searchList = document.createElement("ul");
+ searchList.setAttribute("role", "list");
searchList.classList.add("search");
const out = document.getElementById("search-results");
@@ -318,7 +326,7 @@ const Search = {
const indexEntries = Search._index.indexentries;
// Collect multiple result groups to be sorted separately and then ordered.
- // Each is an array of [docname, title, anchor, descr, score, filename].
+ // Each is an array of [docname, title, anchor, descr, score, filename, kind].
const normalResults = [];
const nonMainIndexResults = [];
@@ -337,6 +345,7 @@ const Search = {
null,
score + boost,
filenames[file],
+ SearchResultKind.title,
]);
}
}
@@ -354,6 +363,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.index,
];
if (isMain) {
normalResults.push(result);
@@ -475,6 +485,7 @@ const Search = {
descr,
score,
filenames[match[0]],
+ SearchResultKind.object,
]);
};
Object.keys(objects).forEach((prefix) =>
@@ -585,6 +596,7 @@ const Search = {
null,
score,
filenames[file],
+ SearchResultKind.text,
]);
}
return results;
diff --git a/v0.2.0/changelog.html b/v0.2.0/changelog.html
index ac81a6f231..fc45a50384 100644
--- a/v0.2.0/changelog.html
+++ b/v0.2.0/changelog.html
@@ -14,7 +14,7 @@
-
+
Changelog - docTR documentation
@@ -446,7 +446,7 @@ v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
Source code for doctr.datasets.cord
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['CORD']
+__all__ = ["CORD"]
-[docs]
+[docs]
class CORD(VisionDataset):
"""CORD dataset from `"CORD: A Consolidated Receipt Dataset forPost-OCR Parsing"
<https://openreview.net/pdf?id=SJl3z659UH>`_.
- Example::
- >>> from doctr.datasets import CORD
- >>> train_set = CORD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/cord-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import CORD
+ >>> train_set = CORD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_train.zip',
- '45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/cord_test.zip',
- '8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_train.zip&src=0",
+ "45f9dc77f126490f3e52d7cb4f70ef3c57e649ea86d19d862a2757c9c455d7f8",
+ "cord_train.zip",
+ )
+
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/cord_test.zip&src=0",
+ "8c895e3d6f7e1161c5b7245e3723ce15c04d84be89eaa6093949b75a66fb3c58",
+ "cord_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
-
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
-
- # # List images
- self.root = os.path.join(self._root, 'image')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
+
+ # List images
+ tmp_root = os.path.join(self.root, "image")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
- for img_path in os.listdir(self.root):
+ np_dtype = np.float32
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking CORD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
_targets = []
- with open(os.path.join(self._root, 'json', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, "json", f"{stem}.json"), "rb") as f:
label = json.load(f)
for line in label["valid_line"]:
for word in line["words"]:
- x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
- y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
- # Reduce 8 coords to 4
- left, right = min(x), max(x)
- top, bot = min(y), max(y)
if len(word["text"]) > 0:
- _targets.append((word["text"], [left, top, right, bot]))
+ x = word["quad"]["x1"], word["quad"]["x2"], word["quad"]["x3"], word["quad"]["x4"]
+ y = word["quad"]["y1"], word["quad"]["y2"], word["quad"]["y3"], word["quad"]["y4"]
+ box: Union[List[float], np.ndarray]
+ if use_polygons:
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box = np.array(
+ [
+ [x[0], y[0]],
+ [x[1], y[1]],
+ [x[2], y[2]],
+ [x[3], y[3]],
+ ],
+ dtype=np_dtype,
+ )
+ else:
+ # Reduce 8 coords to 4 -> xmin, ymin, xmax, ymax
+ box = [min(x), min(y), max(x), max(y)]
+ _targets.append((word["text"], box))
text_targets, box_targets = zip(*_targets)
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=int).clip(min=0)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=int).clip(min=0)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=int).clip(min=0), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
Source code for doctr.datasets.cord
Source code for doctr.datasets.core
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import os
-from pathlib import Path
-from zipfile import ZipFile
-from typing import List, Any, Optional
-
-from doctr.models.utils import download_from_url
-
-
-__all__ = ['AbstractDataset', 'VisionDataset']
-
-
-class AbstractDataset:
-
- data: List[Any] = []
-
- def __len__(self):
- return len(self.data)
-
- def __getitem__(self, index: int) -> Any:
- raise NotImplementedError
-
- def extra_repr(self) -> str:
- return ""
-
- def __repr__(self) -> str:
- return f"{self.__class__.__name__}({self.extra_repr()})"
-
-
-
-[docs]
-class VisionDataset(AbstractDataset):
- """Implements an abstract dataset
-
- Args:
- url: URL of the dataset
- file_name: name of the file once downloaded
- file_hash: expected SHA256 of the file
- extract_archive: whether the downloaded file is an archive to be extracted
- download: whether the dataset should be downloaded if not present on disk
- overwrite: whether the archive should be re-extracted
- """
-
- def __init__(
- self,
- url: str,
- file_name: Optional[str] = None,
- file_hash: Optional[str] = None,
- extract_archive: bool = False,
- download: bool = False,
- overwrite: bool = False,
- ) -> None:
-
- dataset_cache = os.path.join(os.path.expanduser('~'), '.cache', 'doctr', 'datasets')
-
- file_name = file_name if isinstance(file_name, str) else os.path.basename(url)
- # Download the file if not present
- archive_path = os.path.join(dataset_cache, file_name)
-
- if not os.path.exists(archive_path) and not download:
- raise ValueError("the dataset needs to be downloaded first with download=True")
-
- archive_path = download_from_url(url, file_name, file_hash, cache_subdir='datasets')
-
- # Extract the archive
- if extract_archive:
- archive_path = Path(archive_path)
- dataset_path = archive_path.parent.joinpath(archive_path.stem)
- if not dataset_path.is_dir() or overwrite:
- with ZipFile(archive_path, 'r') as f:
- f.extractall(path=dataset_path)
-
- # List images
- self._root = dataset_path if extract_archive else archive_path
- self.data: List[Any] = []
-
-
Source code for doctr.datasets.detection
Source code for doctr.datasets.doc_artefacts
Source code for doctr.datasets.funsd
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import json
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
-from .core import VisionDataset
+import numpy as np
+from tqdm import tqdm
+
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['FUNSD']
+__all__ = ["FUNSD"]
-[docs]
+[docs]
class FUNSD(VisionDataset):
"""FUNSD dataset from `"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents"
<https://arxiv.org/pdf/1905.13538.pdf>`_.
- Example::
- >>> from doctr.datasets import FUNSD
- >>> train_set = FUNSD(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/funsd-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import FUNSD
+ >>> train_set = FUNSD(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- URL = 'https://guillaumejaume.github.io/FUNSD/dataset.zip'
- SHA256 = 'c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f'
- FILE_NAME = 'funsd.zip'
+ URL = "https://guillaumejaume.github.io/FUNSD/dataset.zip"
+ SHA256 = "c31735649e4f441bcbb4fd0f379574f7520b42286e80b01d80b445649d54761f"
+ FILE_NAME = "funsd.zip"
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ super().__init__(
+ self.URL,
+ self.FILE_NAME,
+ self.SHA256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- super().__init__(self.URL, self.FILE_NAME, self.SHA256, True, **kwargs)
self.train = train
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
+ np_dtype = np.float32
# Use the subset
- subfolder = os.path.join('dataset', 'training_data' if train else 'testing_data')
+ subfolder = os.path.join("dataset", "training_data" if train else "testing_data")
# # List images
- self.root = os.path.join(self._root, subfolder, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
+ tmp_root = os.path.join(self.root, subfolder, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking FUNSD", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
+
stem = Path(img_path).stem
- with open(os.path.join(self._root, subfolder, 'annotations', f"{stem}.json"), 'rb') as f:
+ with open(os.path.join(self.root, subfolder, "annotations", f"{stem}.json"), "rb") as f:
data = json.load(f)
- _targets = [(word['text'], word['box']) for block in data['form']
- for word in block['words'] if len(word['text']) > 0]
-
+ _targets = [
+ (word["text"], word["box"])
+ for block in data["form"]
+ for word in block["words"]
+ if len(word["text"]) > 0
+ ]
text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.int), labels=text_targets)))
+ if use_polygons:
+ # xmin, ymin, xmax, ymax -> (x, y) coordinates of top left, top right, bottom right, bottom left corners
+ box_targets = [ # type: ignore[assignment]
+ [
+ [box[0], box[1]],
+ [box[2], box[1]],
+ [box[2], box[3]],
+ [box[0], box[3]],
+ ]
+ for box in box_targets
+ ]
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(
+ img_path=os.path.join(tmp_root, img_path), geoms=np.asarray(box_targets, dtype=np_dtype)
+ )
+ for crop, label in zip(crops, list(text_targets)):
+ # filter labels with unknown characters
+ if not any(char in label for char in ["☑", "☐", "\uf703", "\uf702"]):
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, np.asarray(box_targets, dtype=np_dtype)))
+ else:
+ self.data.append((
+ img_path,
+ dict(boxes=np.asarray(box_targets, dtype=np_dtype), labels=list(text_targets)),
+ ))
+
+ self.root = tmp_root
def extra_repr(self) -> str:
- return f"train={self.train}"
-
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
-
- return images, list(targets)
+ return f"train={self.train}"
Source code for doctr.datasets.funsd
Source code for doctr.datasets.generator.tensorflow
Source code for doctr.datasets.ic03
Source code for doctr.datasets.ic13
Source code for doctr.datasets.iiit5k
Source code for doctr.datasets.iiithws
Source code for doctr.datasets.imgur5k
Source code for doctr.datasets.loader
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import math
-import tensorflow as tf
-import numpy as np
-from typing import List, Tuple, Dict, Any, Optional
+from typing import Callable, Optional
-from .multithreading import multithread_exec
+import numpy as np
+import tensorflow as tf
__all__ = ["DataLoader"]
@@ -288,12 +314,13 @@ Source code for doctr.datasets.loader
"""Collate multiple elements into batches
Args:
+ ----
samples: list of N tuples containing M elements
Returns:
+ -------
Tuple of M sequences contianing N elements each
"""
-
batch_data = zip(*samples)
tf_data = tuple(tf.stack(elt, axis=0) for elt in batch_data)
@@ -302,23 +329,23 @@ Source code for doctr.datasets.loader
-[docs]
+[docs]
class DataLoader:
"""Implements a dataset wrapper for fast data loading
- Example::
- >>> from doctr.datasets import FUNSD, DataLoader
- >>> train_set = CORD(train=True, download=True)
- >>> train_loader = DataLoader(train_set, batch_size=32)
- >>> train_iter = iter(train_loader)
- >>> images, targets = next(train_iter)
+ >>> from doctr.datasets import CORD, DataLoader
+ >>> train_set = CORD(train=True, download=True)
+ >>> train_loader = DataLoader(train_set, batch_size=32)
+ >>> train_iter = iter(train_loader)
+ >>> images, targets = next(train_iter)
Args:
+ ----
dataset: the dataset
shuffle: whether the samples should be shuffled before passing it to the iterator
batch_size: number of elements in each batch
drop_last: if `True`, drops the last batch if it isn't full
- workers: number of workers to use for data loading
+ collate_fn: function to merge samples into a batch
"""
def __init__(
@@ -327,17 +354,22 @@ Source code for doctr.datasets.loader
shuffle: bool = True,
batch_size: int = 1,
drop_last: bool = False,
- workers: Optional[int] = None,
+ collate_fn: Optional[Callable] = None,
) -> None:
self.dataset = dataset
self.shuffle = shuffle
self.batch_size = batch_size
nb = len(self.dataset) / batch_size
self.num_batches = math.floor(nb) if drop_last else math.ceil(nb)
- self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, 'collate_fn') else default_collate
- self.workers = workers
+ if collate_fn is None:
+ self.collate_fn = self.dataset.collate_fn if hasattr(self.dataset, "collate_fn") else default_collate
+ else:
+ self.collate_fn = collate_fn
self.reset()
+ def __len__(self) -> int:
+ return self.num_batches
+
def reset(self) -> None:
# Updates indices after each epoch
self._num_yielded = 0
@@ -353,9 +385,9 @@ Source code for doctr.datasets.loader
if self._num_yielded < self.num_batches:
# Get next indices
idx = self._num_yielded * self.batch_size
- indices = self.indices[idx: min(len(self.dataset), idx + self.batch_size)]
+ indices = self.indices[idx : min(len(self.dataset), idx + self.batch_size)]
- samples = multithread_exec(self.dataset.__getitem__, indices, threads=self.workers)
+ samples = list(map(self.dataset.__getitem__, indices))
batch_data = self.collate_fn(samples)
@@ -396,8 +428,8 @@ Source code for doctr.datasets.loader
-
-
+
+
diff --git a/v0.2.0/_modules/doctr/datasets/mjsynth.html b/v0.2.0/_modules/doctr/datasets/mjsynth.html
index 77bb01d523..df34e49cf9 100644
--- a/v0.2.0/_modules/doctr/datasets/mjsynth.html
+++ b/v0.2.0/_modules/doctr/datasets/mjsynth.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.mjsynth - docTR documentation
@@ -438,7 +438,7 @@ Source code for doctr.datasets.mjsynth
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/ocr.html b/v0.2.0/_modules/doctr/datasets/ocr.html
index 5832933ea5..ce1ed8b0d4 100644
--- a/v0.2.0/_modules/doctr/datasets/ocr.html
+++ b/v0.2.0/_modules/doctr/datasets/ocr.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.ocr - docTR documentation
@@ -403,7 +403,7 @@ Source code for doctr.datasets.ocr
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/recognition.html b/v0.2.0/_modules/doctr/datasets/recognition.html
index 512c70c308..1754789364 100644
--- a/v0.2.0/_modules/doctr/datasets/recognition.html
+++ b/v0.2.0/_modules/doctr/datasets/recognition.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.recognition - docTR documentation
@@ -388,7 +388,7 @@ Source code for doctr.datasets.recognition
-
+
diff --git a/v0.2.0/_modules/doctr/datasets/sroie.html b/v0.2.0/_modules/doctr/datasets/sroie.html
index 97f29ccdda..04cf10bda2 100644
--- a/v0.2.0/_modules/doctr/datasets/sroie.html
+++ b/v0.2.0/_modules/doctr/datasets/sroie.html
@@ -13,7 +13,7 @@
-
+
doctr.datasets.sroie - docTR documentation
@@ -225,15 +225,42 @@
Source code for doctr.datasets.sroie
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-import os
import csv
-import numpy as np
+import os
from pathlib import Path
-from typing import List, Dict, Any, Tuple, Optional, Callable
-import tensorflow as tf
+from typing import Any, Dict, List, Tuple, Union
+
+import numpy as np
+from tqdm import tqdm
-from doctr.documents.reader import read_img
-from .core import VisionDataset
+from .datasets import VisionDataset
+from .utils import convert_target_to_relative, crop_bboxes_from_image
-__all__ = ['SROIE']
+__all__ = ["SROIE"]
-[docs]
+[docs]
class SROIE(VisionDataset):
"""SROIE dataset from `"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"
<https://arxiv.org/pdf/2103.10213.pdf>`_.
- Example::
- >>> from doctr.datasets import SROIE
- >>> train_set = SROIE(train=True, download=True)
- >>> img, target = train_set[0]
+ .. image:: https://doctr-static.mindee.com/models?id=v0.5.0/sroie-grid.png&src=0
+ :align: center
+
+ >>> from doctr.datasets import SROIE
+ >>> train_set = SROIE(train=True, download=True)
+ >>> img, target = train_set[0]
Args:
+ ----
train: whether the subset should be the training one
- sample_transforms: composable transformations that will be applied to each image
+ use_polygons: whether polygons should be considered as rotated bounding box (instead of straight ones)
+ recognition_task: whether the dataset should be used for recognition task
+ detection_task: whether the dataset should be used for detection task
**kwargs: keyword arguments from `VisionDataset`.
"""
- TRAIN = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_train_task1.zip',
- 'd4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f')
- TEST = ('https://github.com/mindee/doctr/releases/download/v0.1.1/sroie2019_test.zip',
- '41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2')
+ TRAIN = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_train_task1.zip&src=0",
+ "d4fa9e60abb03500d83299c845b9c87fd9c9430d1aeac96b83c5d0bb0ab27f6f",
+ "sroie2019_train_task1.zip",
+ )
+ TEST = (
+ "https://doctr-static.mindee.com/models?id=v0.1.1/sroie2019_test.zip&src=0",
+ "41b3c746a20226fddc80d86d4b2a903d43b5be4f521dd1bbe759dbf8844745e2",
+ "sroie2019_test.zip",
+ )
def __init__(
self,
train: bool = True,
- sample_transforms: Optional[Callable[[tf.Tensor], tf.Tensor]] = None,
+ use_polygons: bool = False,
+ recognition_task: bool = False,
+ detection_task: bool = False,
**kwargs: Any,
) -> None:
+ url, sha256, name = self.TRAIN if train else self.TEST
+ super().__init__(
+ url,
+ name,
+ sha256,
+ True,
+ pre_transforms=convert_target_to_relative if not recognition_task else None,
+ **kwargs,
+ )
+ if recognition_task and detection_task:
+ raise ValueError(
+ "`recognition_task` and `detection_task` cannot be set to True simultaneously. "
+ + "To get the whole dataset with boxes and labels leave both parameters to False."
+ )
- url, sha256 = self.TRAIN if train else self.TEST
- super().__init__(url, None, sha256, True, **kwargs)
- self.sample_transforms = (lambda x: x) if sample_transforms is None else sample_transforms
self.train = train
- # # List images
- self.root = os.path.join(self._root, 'images')
- self.data: List[Tuple[str, Dict[str, Any]]] = []
- for img_path in os.listdir(self.root):
- stem = Path(img_path).stem
- _targets = []
- with open(os.path.join(self._root, 'annotations', f"{stem}.txt"), encoding='latin') as f:
- for row in csv.reader(f, delimiter=','):
- # Safeguard for blank lines
- if len(row) > 0:
- # Label may contain commas
- label = ",".join(row[8:])
- # Reduce 8 coords to 4
- p1_x, p1_y, p2_x, p2_y, p3_x, p3_y, p4_x, p4_y = map(int, row[:8])
- left, right = min(p1_x, p2_x, p3_x, p4_x), max(p1_x, p2_x, p3_x, p4_x)
- top, bot = min(p1_y, p2_y, p3_y, p4_y), max(p1_y, p2_y, p3_y, p4_y)
- if len(label) > 0:
- _targets.append((label, [left, top, right, bot]))
-
- text_targets, box_targets = zip(*_targets)
-
- self.data.append((img_path, dict(boxes=np.asarray(box_targets, dtype=np.float32), labels=text_targets)))
+ tmp_root = os.path.join(self.root, "images")
+ self.data: List[Tuple[Union[str, np.ndarray], Union[str, Dict[str, Any], np.ndarray]]] = []
+ np_dtype = np.float32
- def extra_repr(self) -> str:
- return f"train={self.train}"
+ for img_path in tqdm(iterable=os.listdir(tmp_root), desc="Unpacking SROIE", total=len(os.listdir(tmp_root))):
+ # File existence check
+ if not os.path.exists(os.path.join(tmp_root, img_path)):
+ raise FileNotFoundError(f"unable to locate {os.path.join(tmp_root, img_path)}")
- def __getitem__(self, index: int) -> Tuple[tf.Tensor, Dict[str, Any]]:
- img_name, target = self.data[index]
- # Read image
- img = tf.io.read_file(os.path.join(self.root, img_name))
- img = tf.image.decode_jpeg(img, channels=3)
- img = self.sample_transforms(img)
-
- return img, target
-
- @staticmethod
- def collate_fn(samples: List[Tuple[tf.Tensor, Dict[str, Any]]]) -> Tuple[tf.Tensor, List[Dict[str, Any]]]:
-
- images, targets = zip(*samples)
- images = tf.stack(images, axis=0)
+ stem = Path(img_path).stem
+ with open(os.path.join(self.root, "annotations", f"{stem}.txt"), encoding="latin") as f:
+ _rows = [row for row in list(csv.reader(f, delimiter=",")) if len(row) > 0]
+
+ labels = [",".join(row[8:]) for row in _rows]
+ # reorder coordinates (8 -> (4,2) ->
+ # (x, y) coordinates of top left, top right, bottom right, bottom left corners) and filter empty lines
+ coords: np.ndarray = np.stack(
+ [np.array(list(map(int, row[:8])), dtype=np_dtype).reshape((4, 2)) for row in _rows], axis=0
+ )
+
+ if not use_polygons:
+ # xmin, ymin, xmax, ymax
+ coords = np.concatenate((coords.min(axis=1), coords.max(axis=1)), axis=1)
+
+ if recognition_task:
+ crops = crop_bboxes_from_image(img_path=os.path.join(tmp_root, img_path), geoms=coords)
+ for crop, label in zip(crops, labels):
+ if crop.shape[0] > 0 and crop.shape[1] > 0 and len(label) > 0:
+ self.data.append((crop, label))
+ elif detection_task:
+ self.data.append((img_path, coords))
+ else:
+ self.data.append((img_path, dict(boxes=coords, labels=labels)))
+
+ self.root = tmp_root
- return images, list(targets)
+ def extra_repr(self) -> str:
+ return f"train={self.train}"
Source code for doctr.datasets.sroie
Source code for doctr.datasets.svhn
Source code for doctr.datasets.svt
Source code for doctr.datasets.synthtext
Source code for doctr.datasets.utils
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
import string
import unicodedata
+from collections.abc import Sequence
+from functools import partial
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple, TypeVar, Union
+from typing import Sequence as SequenceType
+
import numpy as np
-from typing import List, Optional, Any
+from PIL import Image
+
+from doctr.io.image import get_img_shape
+from doctr.utils.geometry import convert_to_relative_coords, extract_crops, extract_rcrops
from .vocabs import VOCABS
-__all__ = ['translate', 'encode_sequence', 'decode_sequence', 'encode_sequences']
+__all__ = ["translate", "encode_string", "decode_sequence", "encode_sequences", "pre_transform_multiclass"]
+
+ImageTensor = TypeVar("ImageTensor")
def translate(
input_string: str,
vocab_name: str,
- unknown_char: str = '■',
+ unknown_char: str = "■",
) -> str:
"""Translate a string input in a given vocabulary
Args:
+ ----
input_string: input string to translate
vocab_name: vocabulary to use (french, latin, ...)
unknown_char: unknown character for non-translatable characters
Returns:
- A string translated in a given vocab"""
-
+ -------
+ A string translated in a given vocab
+ """
if VOCABS.get(vocab_name) is None:
raise KeyError("output vocabulary must be in vocabs dictionnary")
- translated = ''
+ translated = ""
for char in input_string:
if char not in VOCABS[vocab_name]:
# we need to translate char into a vocab char
@@ -310,85 +350,177 @@ Source code for doctr.datasets.utils
# remove whitespaces
continue
# normalize character if it is not in vocab
- char = unicodedata.normalize('NFD', char).encode('ascii', 'ignore').decode('ascii')
- if char == '' or char not in VOCABS[vocab_name]:
+ char = unicodedata.normalize("NFD", char).encode("ascii", "ignore").decode("ascii")
+ if char == "" or char not in VOCABS[vocab_name]:
# if normalization fails or char still not in vocab, return unknown character)
char = unknown_char
translated += char
return translated
-def encode_sequence(
+def encode_string(
input_string: str,
vocab: str,
-) -> List[str]:
+) -> List[int]:
"""Given a predefined mapping, encode the string to a sequence of numbers
Args:
+ ----
input_string: string to encode
vocab: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A list encoding the input_string"""
-
- return list(map(vocab.index, input_string))
+ -------
+ A list encoding the input_string
+ """
+ try:
+ return list(map(vocab.index, input_string))
+ except ValueError:
+ raise ValueError(
+ f"some characters cannot be found in 'vocab'. \
+ Please check the input string {input_string} and the vocabulary {vocab}"
+ )
def decode_sequence(
- input_array: np.array,
+ input_seq: Union[np.ndarray, SequenceType[int]],
mapping: str,
) -> str:
"""Given a predefined mapping, decode the sequence of numbers to a string
Args:
- input_array: array to decode
+ ----
+ input_seq: array to decode
mapping: vocabulary (string), the encoding is given by the indexing of the character sequence
Returns:
- A string, decoded from input_array"""
-
- if not input_array.dtype == np.int_ or input_array.max() >= len(mapping):
+ -------
+ A string, decoded from input_seq
+ """
+ if not isinstance(input_seq, (Sequence, np.ndarray)):
+ raise TypeError("Invalid sequence type")
+ if isinstance(input_seq, np.ndarray) and (input_seq.dtype != np.int_ or input_seq.max() >= len(mapping)):
raise AssertionError("Input must be an array of int, with max less than mapping size")
- decoded = ''.join(mapping[idx] for idx in input_array)
- return decoded
+
+ return "".join(map(mapping.__getitem__, input_seq))
-[docs]
+[docs]
def encode_sequences(
sequences: List[str],
vocab: str,
target_size: Optional[int] = None,
eos: int = -1,
- **kwargs: Any,
+ sos: Optional[int] = None,
+ pad: Optional[int] = None,
+ dynamic_seq_length: bool = False,
) -> np.ndarray:
"""Encode character sequences using a given vocab as mapping
Args:
+ ----
sequences: the list of character sequences of size N
vocab: the ordered vocab to use for encoding
target_size: maximum length of the encoded data
eos: encoding of End Of String
+ sos: optional encoding of Start Of String
+ pad: optional encoding for padding. In case of padding, all sequences are followed by 1 EOS then PAD
+ dynamic_seq_length: if `target_size` is specified, uses it as upper bound and enables dynamic sequence size
Returns:
+ -------
the padded encoded data as a tensor
"""
-
if 0 <= eos < len(vocab):
raise ValueError("argument 'eos' needs to be outside of vocab possible indices")
- if not isinstance(target_size, int):
- target_size = max(len(w) for w in sequences)
+ if not isinstance(target_size, int) or dynamic_seq_length:
+ # Maximum string length + EOS
+ max_length = max(len(w) for w in sequences) + 1
+ if isinstance(sos, int):
+ max_length += 1
+ if isinstance(pad, int):
+ max_length += 1
+ target_size = max_length if not isinstance(target_size, int) else min(max_length, target_size)
# Pad all sequences
- encoded_data = np.full([len(sequences), target_size], eos, dtype=np.int32)
-
- for idx, seq in enumerate(sequences):
- encoded_seq = encode_sequence(seq, vocab)
- encoded_data[idx, :min(len(encoded_seq), target_size)] = encoded_seq[:min(len(encoded_seq), target_size)]
+ if isinstance(pad, int): # pad with padding symbol
+ if 0 <= pad < len(vocab):
+ raise ValueError("argument 'pad' needs to be outside of vocab possible indices")
+ # In that case, add EOS at the end of the word before padding
+ default_symbol = pad
+ else: # pad with eos symbol
+ default_symbol = eos
+ encoded_data: np.ndarray = np.full([len(sequences), target_size], default_symbol, dtype=np.int32)
+
+ # Encode the strings
+ for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
+ if isinstance(pad, int): # add eos at the end of the sequence
+ seq.append(eos)
+ encoded_data[idx, : min(len(seq), target_size)] = seq[: min(len(seq), target_size)]
+
+ if isinstance(sos, int): # place sos symbol at the beginning of each sequence
+ if 0 <= sos < len(vocab):
+ raise ValueError("argument 'sos' needs to be outside of vocab possible indices")
+ encoded_data = np.roll(encoded_data, 1)
+ encoded_data[:, 0] = sos
return encoded_data
+
+
+def convert_target_to_relative(
+ img: ImageTensor, target: Union[np.ndarray, Dict[str, Any]]
+) -> Tuple[ImageTensor, Union[Dict[str, Any], np.ndarray]]:
+ if isinstance(target, np.ndarray):
+ target = convert_to_relative_coords(target, get_img_shape(img))
+ else:
+ target["boxes"] = convert_to_relative_coords(target["boxes"], get_img_shape(img))
+ return img, target
+
+
+def crop_bboxes_from_image(img_path: Union[str, Path], geoms: np.ndarray) -> List[np.ndarray]:
+ """Crop a set of bounding boxes from an image
+
+ Args:
+ ----
+ img_path: path to the image
+ geoms: a array of polygons of shape (N, 4, 2) or of straight boxes of shape (N, 4)
+
+ Returns:
+ -------
+ a list of cropped images
+ """
+ with Image.open(img_path) as pil_img:
+ img: np.ndarray = np.array(pil_img.convert("RGB"))
+ # Polygon
+ if geoms.ndim == 3 and geoms.shape[1:] == (4, 2):
+ return extract_rcrops(img, geoms.astype(dtype=int))
+ if geoms.ndim == 2 and geoms.shape[1] == 4:
+ return extract_crops(img, geoms.astype(dtype=int))
+ raise ValueError("Invalid geometry format")
+
+
+def pre_transform_multiclass(img, target: Tuple[np.ndarray, List]) -> Tuple[np.ndarray, Dict[str, List]]:
+ """Converts multiclass target to relative coordinates.
+
+ Args:
+ ----
+ img: Image
+ target: tuple of target polygons and their classes names
+
+ Returns:
+ -------
+ Image and dictionary of boxes, with class names as keys
+ """
+ boxes = convert_to_relative_coords(target[0], get_img_shape(img))
+ boxes_classes = target[1]
+ boxes_dict: Dict = {k: [] for k in sorted(set(boxes_classes))}
+ for k, poly in zip(boxes_classes, boxes):
+ boxes_dict[k].append(poly)
+ boxes_dict = {k: np.stack(v, axis=0) for k, v in boxes_dict.items()}
+ return img, boxes_dict
Source code for doctr.datasets.utils
Source code for doctr.datasets.wildreceipt
Source code for doctr.documents.elements
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import matplotlib.pyplot as plt
-from typing import Tuple, Dict, List, Any, Optional
-
-from doctr.utils.geometry import resolve_enclosing_bbox
-from doctr.utils.visualization import visualize_page
-from doctr.utils.common_types import BoundingBox
-from doctr.utils.repr import NestedObject
-
-__all__ = ['Element', 'Word', 'Artefact', 'Line', 'Block', 'Page', 'Document']
-
-
-class Element(NestedObject):
- """Implements an abstract document element with exporting and text rendering capabilities"""
-
- _exported_keys: List[str] = []
-
- def __init__(self, **kwargs: Any) -> None:
- self._children_names: List[str] = []
- for k, v in kwargs.items():
- setattr(self, k, v)
- self._children_names.append(k)
-
- def export(self) -> Dict[str, Any]:
- """Exports the object into a nested dict format"""
-
- export_dict = {k: getattr(self, k) for k in self._exported_keys}
- for children_name in self._children_names:
- export_dict[children_name] = [c.export() for c in getattr(self, children_name)]
-
- return export_dict
-
- def render(self) -> str:
- raise NotImplementedError
-
-
-
-[docs]
-class Word(Element):
- """Implements a word element
-
- Args:
- value: the text string of the word
- confidence: the confidence associated with the text prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size
- """
-
- _exported_keys: List[str] = ["value", "confidence", "geometry"]
-
- def __init__(self, value: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.value = value
- self.confidence = confidence
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return self.value
-
- def extra_repr(self) -> str:
- return f"value='{self.value}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Artefact(Element):
- """Implements a non-textual element
-
- Args:
- artefact_type: the type of artefact
- confidence: the confidence of the type prediction
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size.
- """
-
- _exported_keys: List[str] = ["geometry", "type", "confidence"]
-
- def __init__(self, artefact_type: str, confidence: float, geometry: BoundingBox) -> None:
- super().__init__()
- self.geometry = geometry
- self.type = artefact_type
- self.confidence = confidence
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return f"[{self.type.upper()}]"
-
- def extra_repr(self) -> str:
- return f"type='{self.type}', confidence={self.confidence:.2}"
-
-
-
-
-[docs]
-class Line(Element):
- """Implements a line element as a collection of words
-
- Args:
- words: list of word elements
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all words in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- words: List[Word] = []
-
- def __init__(
- self,
- words: List[Word],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- geometry = resolve_enclosing_bbox([w.geometry for w in words])
-
- super().__init__(words=words)
- self.geometry = geometry
-
- def render(self) -> str:
- """Renders the full text of the element"""
- return " ".join(w.render() for w in self.words)
-
-
-
-
-[docs]
-class Block(Element):
- """Implements a block element as a collection of lines and artefacts
-
- Args:
- lines: list of line elements
- artefacts: list of artefacts
- geometry: bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
- the page's size. If not specified, it will be resolved by default to the smallest bounding box enclosing
- all lines and artefacts in it.
- """
-
- _exported_keys: List[str] = ["geometry"]
- lines: List[Line] = []
- artefacts: List[Artefact] = []
-
- def __init__(
- self,
- lines: List[Line] = [],
- artefacts: List[Artefact] = [],
- geometry: Optional[BoundingBox] = None,
- ) -> None:
- # Resolve the geometry using the smallest enclosing bounding box
- if geometry is None:
- line_boxes = [word.geometry for line in lines for word in line.words]
- artefact_boxes = [artefact.geometry for artefact in artefacts]
- geometry = resolve_enclosing_bbox(line_boxes + artefact_boxes)
- super().__init__(lines=lines, artefacts=artefacts)
- self.geometry = geometry
-
- def render(self, line_break: str = '\n') -> str:
- """Renders the full text of the element"""
- return line_break.join(line.render() for line in self.lines)
-
-
-
-
-[docs]
-class Page(Element):
- """Implements a page element as a collection of blocks
-
- Args:
- blocks: list of block elements
- page_idx: the index of the page in the input raw document
- dimensions: the page size in pixels in format (width, height)
- orientation: a dictionary with the value of the rotation angle in degress and confidence of the prediction
- language: a dictionary with the language value and confidence of the prediction
- """
-
- _exported_keys: List[str] = ["page_idx", "dimensions", "orientation", "language"]
- blocks: List[Block] = []
-
- def __init__(
- self,
- blocks: List[Block],
- page_idx: int,
- dimensions: Tuple[int, int],
- orientation: Optional[Dict[str, Any]] = None,
- language: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(blocks=blocks)
- self.page_idx = page_idx
- self.dimensions = dimensions
- self.orientation = orientation if isinstance(orientation, dict) else dict(value=None, confidence=None)
- self.language = language if isinstance(language, dict) else dict(value=None, confidence=None)
-
- def render(self, block_break: str = '\n\n') -> str:
- """Renders the full text of the element"""
- return block_break.join(b.render() for b in self.blocks)
-
- def extra_repr(self) -> str:
- return f"dimensions={self.dimensions}"
-
- def show(self, page: np.ndarray, interactive: bool = True, **kwargs) -> None:
- visualize_page(self.export(), page, interactive=interactive)
- plt.show(**kwargs)
-
-
-
-
-[docs]
-class Document(Element):
- """Implements a document element as a collection of pages
-
- Args:
- pages: list of page elements
- """
-
- pages: List[Page] = []
-
- def __init__(
- self,
- pages: List[Page],
- ) -> None:
- super().__init__(pages=pages)
-
- def render(self, page_break: str = '\n\n\n\n') -> str:
- """Renders the full text of the element"""
- return page_break.join(p.render() for p in self.pages)
-
- def show(self, pages: List[np.ndarray], **kwargs) -> None:
- """Plot the results"""
- for img, result in zip(pages, self.pages):
- result.show(img, **kwargs)
-
-
Source code for doctr.documents.reader
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import numpy as np
-import cv2
-from pathlib import Path
-import fitz
-from weasyprint import HTML
-from typing import List, Tuple, Optional, Any, Union, Sequence
-
-__all__ = ['read_pdf', 'read_img', 'read_html', 'DocumentFile', 'PDF']
-
-
-AbstractPath = Union[str, Path]
-AbstractFile = Union[AbstractPath, bytes]
-Bbox = Tuple[float, float, float, float]
-
-
-
-[docs]
-def read_img(
- file: AbstractFile,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
-) -> np.ndarray:
- """Read an image file into numpy format
-
- Example::
- >>> from doctr.documents import read_img
- >>> page = read_img("path/to/your/doc.jpg")
-
- Args:
- file: the path to the image file
- output_size: the expected output size of each page in format H x W
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- Returns:
- the page decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)):
- if not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
- img = cv2.imread(str(file), cv2.IMREAD_COLOR)
- elif isinstance(file, bytes):
- file = np.frombuffer(file, np.uint8)
- img = cv2.imdecode(file, cv2.IMREAD_COLOR)
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Validity check
- if img is None:
- raise ValueError("unable to read file.")
- # Resizing
- if isinstance(output_size, tuple):
- img = cv2.resize(img, output_size[::-1], interpolation=cv2.INTER_LINEAR)
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
- return img
-
-
-
-
-[docs]
-def read_pdf(file: AbstractFile, **kwargs: Any) -> fitz.Document:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_pdf
- >>> doc = read_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
-
- if isinstance(file, (str, Path)) and not Path(file).is_file():
- raise FileNotFoundError(f"unable to access {file}")
-
- fitz_args = {}
-
- if isinstance(file, (str, Path)):
- fitz_args['filename'] = file
- elif isinstance(file, bytes):
- fitz_args['stream'] = file
- else:
- raise TypeError("unsupported object type for argument 'file'")
-
- # Read pages with fitz and convert them to numpy ndarrays
- return fitz.open(**fitz_args, filetype="pdf", **kwargs)
-
-
-
-def convert_page_to_numpy(
- page: fitz.fitz.Page,
- output_size: Optional[Tuple[int, int]] = None,
- rgb_output: bool = True,
- default_scales: Tuple[float, float] = (2, 2),
-) -> np.ndarray:
- """Convert a fitz page to a numpy-formatted image
-
- Args:
- page: the page of a file read with PyMuPDF
- output_size: the expected output size of each page in format H x W. Default goes to 840 x 595 for A4 pdf,
- if you want to increase the resolution while preserving the original A4 aspect ratio can pass (1024, 726)
- rgb_output: whether the output ndarray channel order should be RGB instead of BGR.
- default_scales: spatial scaling to be applied when output_size is not specified where (1, 1)
- corresponds to 72 dpi rendering.
-
- Returns:
- the rendered image in numpy format
- """
-
- # If no output size is specified, keep the origin one
- if output_size is not None:
- scales = (output_size[1] / page.MediaBox[2], output_size[0] / page.MediaBox[3])
- else:
- # Default 72 DPI (scales of (1, 1)) is unnecessarily low
- scales = default_scales
-
- transform_matrix = fitz.Matrix(*scales)
-
- # Generate the pixel map using the transformation matrix
- pixmap = page.getPixmap(matrix=transform_matrix)
- # Decode it into a numpy
- img = np.frombuffer(pixmap.samples, dtype=np.uint8).reshape(pixmap.height, pixmap.width, 3)
-
- # Switch the channel order
- if rgb_output:
- img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
- return img
-
-
-
-[docs]
-def read_html(url: str, **kwargs: Any) -> bytes:
- """Read a PDF file and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import read_html
- >>> doc = read_html("https://www.yoursite.com")
-
- Args:
- url: URL of the target web page
- Returns:
- decoded PDF file as a bytes stream
- """
-
- return HTML(url, **kwargs).write_pdf()
-
-
-
-
-[docs]
-class PDF:
- """PDF document template
-
- Args:
- doc: input PDF document
- """
- def __init__(self, doc: fitz.Document) -> None:
- self.doc = doc
-
-
-[docs]
- def as_images(self, **kwargs) -> List[np.ndarray]:
- """Convert all document pages to images
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
- Args:
- kwargs: keyword arguments of `convert_page_to_numpy`
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- return [convert_page_to_numpy(page, **kwargs) for page in self.doc]
-
-
- def get_page_words(self, idx, **kwargs) -> List[Tuple[Bbox, str]]:
- """Get the annotations for all words of a given page"""
-
- # xmin, ymin, xmax, ymax, value, block_idx, line_idx, word_idx
- return [(info[:4], info[4]) for info in self.doc[idx].getTextWords(**kwargs)]
-
-
-[docs]
- def get_words(self, **kwargs) -> List[List[Tuple[Bbox, str]]]:
- """Get the annotations for all words in the document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
- Args:
- kwargs: keyword arguments of `fitz.Page.getTextWords`
- Returns:
- the list of pages annotations, represented as a list of tuple (bounding box, value)
- """
- return [self.get_page_words(idx, **kwargs) for idx in range(len(self.doc))]
-
-
- def get_page_artefacts(self, idx) -> List[Tuple[float, float, float, float]]:
- return [tuple(self.doc[idx].getImageBbox(artefact)) for artefact in self.doc[idx].get_images(full=True)]
-
-
-[docs]
- def get_artefacts(self) -> List[List[Tuple[float, float, float, float]]]:
- """Get the artefacts for the entire document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
- Returns:
- the list of pages artefacts, represented as a list of bounding boxes
- """
-
- return [self.get_page_artefacts(idx) for idx in range(len(self.doc))]
-
-
-
-
-
-[docs]
-class DocumentFile:
- """Read a document from multiple extensions"""
-
-
-[docs]
- @classmethod
- def from_pdf(cls, file: AbstractFile, **kwargs) -> PDF:
- """Read a PDF file
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
- Args:
- file: the path to the PDF file or a binary stream
- Returns:
- a PDF document
- """
-
- doc = read_pdf(file, **kwargs)
-
- return PDF(doc)
-
-
-
-[docs]
- @classmethod
- def from_url(cls, url: str, **kwargs) -> PDF:
- """Interpret a web page as a PDF document
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
- Args:
- url: the URL of the target web page
- Returns:
- a PDF document
- """
- pdf_stream = read_html(url)
- return cls.from_pdf(pdf_stream, **kwargs)
-
-
-
-[docs]
- @classmethod
- def from_images(cls, files: Union[Sequence[AbstractFile], AbstractFile], **kwargs) -> List[np.ndarray]:
- """Read an image file (or a collection of image files) and convert it into an image in numpy format
-
- Example::
- >>> from doctr.documents import DocumentFile
- >>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
- Args:
- files: the path to the image file or a binary stream, or a collection of those
- Returns:
- the list of pages decoded as numpy ndarray of shape H x W x 3
- """
- if isinstance(files, (str, Path, bytes)):
- files = [files]
-
- return [read_img(file, **kwargs) for file in files]
-
-
-
Source code for doctr.io.elements
Source code for doctr.io.html
Source code for doctr.io.image.base
Source code for doctr.io.image.tensorflow
Source code for doctr.io.pdf
Source code for doctr.io.reader
Source code for doctr.models.classification.magc_resnet.tensorflow
Source code for doctr.models.classification.mobilenet.tensorflow
Source code for doctr.models.classification.resnet.tensorflow
Source code for doctr.models.classification.textnet.tensorflow
Source code for doctr.models.classification.vgg.tensorflow
Source code for doctr.models.classification.vit.tensorflow
Source code for doctr.models.classification.zoo
Source code for doctr.models.detection.differentiable_binarization
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-import cv2
-from copy import deepcopy
-import numpy as np
-from shapely.geometry import Polygon
-import pyclipper
-import tensorflow as tf
-from tensorflow import keras
-from tensorflow.keras import layers
-from typing import Union, List, Tuple, Optional, Any, Dict
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..utils import IntermediateLayerGetter, load_pretrained_params, conv_sequence
-from doctr.utils.repr import NestedObject
-
-__all__ = ['DBPostProcessor', 'DBNet', 'db_resnet50']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'db_resnet50': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'backbone': 'ResNet50',
- 'fpn_layers': ["conv2_block3_out", "conv3_block4_out", "conv4_block6_out", "conv5_block3_out"],
- 'fpn_channels': 128,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'DBPostProcessor',
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/db_resnet50-98ba765d.zip',
- },
-}
-
-
-class DBPostProcessor(DetectionPostProcessor):
- """Implements a post processor for DBNet adapted from the implementation of `xuannianz
- <https://github.com/xuannianz/DifferentiableBinarization>`_.
-
- Args:
- unclip ratio: ratio used to unshrink polygons
- min_size_box: minimal length (pix) to keep a box
- max_candidates: maximum boxes to consider in a single page
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- unclip_ratio: Union[float, int] = 1.5,
- max_candidates: int = 1000,
- box_thresh: float = 0.1,
- bin_thresh: float = 0.3,
- ) -> None:
-
- super().__init__(
- box_thresh,
- bin_thresh
- )
- self.unclip_ratio = unclip_ratio
- self.max_candidates = max_candidates
-
- def polygon_to_box(
- self,
- points: np.ndarray,
- ) -> Optional[Tuple[int, int, int, int]]:
- """Expand a polygon (points) by a factor unclip_ratio, and returns a 4-points box
-
- Args:
- points: The first parameter.
-
- Returns:
- a box in absolute coordinates (x, y, w, h)
- """
- poly = Polygon(points)
- distance = poly.area * self.unclip_ratio / poly.length # compute distance to expand polygon
- offset = pyclipper.PyclipperOffset()
- offset.AddPath(points, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- _points = offset.Execute(distance)
- # Take biggest stack of points
- idx = 0
- if len(_points) > 1:
- max_size = 0
- for _idx, p in enumerate(_points):
- if len(p) > max_size:
- idx = _idx
- max_size = len(p)
- # We ensure that _points can be correctly casted to a ndarray
- _points = [_points[idx]]
- expanded_points = np.asarray(_points) # expand polygon
- if len(expanded_points) < 1:
- return None
- x, y, w, h = cv2.boundingRect(expanded_points) # compute a 4-points box from expanded polygon
- return x, y, w, h
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map
-
- Args:
- pred: Pred map from differentiable binarization output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- # get contours from connected components on the bitmap
- contours, _ = cv2.findContours(bitmap.astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
- for contour in contours[:self.max_candidates]:
- # Check whether smallest enclosing bounding box is not too small
- if np.any(contour[:, 0].max(axis=0) - contour[:, 0].min(axis=0) < min_size_box):
- continue
- epsilon = 0.01 * cv2.arcLength(contour, True)
- approx = cv2.approxPolyDP(contour, epsilon, True) # approximate contour by a polygon
- points = approx.reshape((-1, 2)) # get polygon points
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- _box = self.polygon_to_box(points)
-
- if _box is None or _box[2] < min_size_box or _box[3] < min_size_box: # remove to small boxes
- continue
- x, y, w, h = _box
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-class FeaturePyramidNetwork(layers.Layer, NestedObject):
- """Feature Pyramid Network as described in `"Feature Pyramid Networks for Object Detection"
- <https://arxiv.org/pdf/1612.03144.pdf>`_.
-
- Args:
- channels: number of channel to output
- """
-
- def __init__(
- self,
- channels: int,
- ) -> None:
- super().__init__()
- self.channels = channels
- self.upsample = layers.UpSampling2D(size=(2, 2), interpolation='nearest')
- self.inner_blocks = [layers.Conv2D(channels, 1, strides=1, kernel_initializer='he_normal') for _ in range(4)]
- self.layer_blocks = [self.build_upsampling(channels, dilation_factor=2 ** idx) for idx in range(4)]
-
- @staticmethod
- def build_upsampling(
- channels: int,
- dilation_factor: int = 1,
- ) -> layers.Layer:
- """Module which performs a 3x3 convolution followed by up-sampling
-
- Args:
- channels: number of output channels
- dilation_factor (int): dilation factor to scale the convolution output before concatenation
-
- Returns:
- a keras.layers.Layer object, wrapping these operations in a sequential module
-
- """
-
- _layers = conv_sequence(channels, 'relu', True, kernel_size=3)
-
- if dilation_factor > 1:
- _layers.append(layers.UpSampling2D(size=(dilation_factor, dilation_factor), interpolation='nearest'))
-
- module = keras.Sequential(_layers)
-
- return module
-
- def extra_repr(self) -> str:
- return f"channels={self.channels}"
-
- def call(
- self,
- x: List[tf.Tensor],
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # Channel mapping
- results = [block(fmap, **kwargs) for block, fmap in zip(self.inner_blocks, x)]
- # Upsample & sum
- for idx in range(len(results) - 1, -1):
- results[idx] += self.upsample(results[idx + 1])
- # Conv & upsample
- results = [block(fmap, **kwargs) for block, fmap in zip(self.layer_blocks, results)]
-
- return layers.concatenate(results)
-
-
-class DBNet(DetectionModel, NestedObject):
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_.
-
- Args:
- feature extractor: the backbone serving as feature extractor
- fpn_channels: number of channels each extracted feature maps is mapped to
- """
-
- _children_names = ['feat_extractor', 'fpn', 'probability_head', 'threshold_head']
-
- def __init__(
- self,
- feature_extractor: IntermediateLayerGetter,
- fpn_channels: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(cfg=cfg)
-
- self.shrink_ratio = 0.4
- self.thresh_min = 0.3
- self.thresh_max = 0.7
- self.min_size_box = 3
-
- self.feat_extractor = feature_extractor
-
- self.fpn = FeaturePyramidNetwork(channels=fpn_channels)
- # Initialize kernels
- _inputs = [layers.Input(shape=in_shape[1:]) for in_shape in self.feat_extractor.output_shape]
- output_shape = tuple(self.fpn(_inputs).shape)
-
- self.probability_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
- self.threshold_head = keras.Sequential(
- [
- *conv_sequence(64, 'relu', True, kernel_size=3, input_shape=output_shape[1:]),
- layers.Conv2DTranspose(64, 2, strides=2, use_bias=False, kernel_initializer='he_normal'),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- layers.Conv2DTranspose(1, 2, strides=2, kernel_initializer='he_normal'),
- ]
- )
-
- self.postprocessor = DBPostProcessor()
-
- @staticmethod
- def compute_distance(
- xs: np.array,
- ys: np.array,
- a: np.array,
- b: np.array,
- eps: float = 1e-7,
- ) -> float:
- """Compute the distance for each point of the map (xs, ys) to the (a, b) segment
-
- Args:
- xs : map of x coordinates (height, width)
- ys : map of y coordinates (height, width)
- a: first point defining the [ab] segment
- b: second point defining the [ab] segment
-
- Returns:
- The computed distance
-
- """
- square_dist_1 = np.square(xs - a[0]) + np.square(ys - a[1])
- square_dist_2 = np.square(xs - b[0]) + np.square(ys - b[1])
- square_dist = np.square(a[0] - b[0]) + np.square(a[1] - b[1])
- cosin = (square_dist - square_dist_1 - square_dist_2) / (2 * np.sqrt(square_dist_1 * square_dist_2) + eps)
- square_sin = 1 - np.square(cosin)
- square_sin = np.nan_to_num(square_sin)
- result = np.sqrt(square_dist_1 * square_dist_2 * square_sin / square_dist)
- result[cosin < 0] = np.sqrt(np.fmin(square_dist_1, square_dist_2))[cosin < 0]
- return result
-
- def draw_thresh_map(
- self,
- polygon: np.array,
- canvas: np.array,
- mask: np.array,
- ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
- """Draw a polygon treshold map on a canvas, as described in the DB paper
-
- Args:
- polygon : array of coord., to draw the boundary of the polygon
- canvas : threshold map to fill with polygons
- mask : mask for training on threshold polygons
- """
- if polygon.ndim != 2 or polygon.shape[1] != 2:
- raise AttributeError("polygon should be a 2 dimensional array of coords")
-
- # Augment polygon by shrink_ratio
- polygon_shape = Polygon(polygon)
- distance = polygon_shape.area * (1 - np.power(self.shrink_ratio, 2)) / polygon_shape.length
- subject = [tuple(coor) for coor in polygon] # Get coord as list of tuples
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- padded_polygon = np.array(padding.Execute(distance)[0])
-
- # Fill the mask with 1 on the new padded polygon
- cv2.fillPoly(mask, [padded_polygon.astype(np.int32)], 1.0)
-
- # Get min/max to recover polygon after distance computation
- xmin = padded_polygon[:, 0].min()
- xmax = padded_polygon[:, 0].max()
- ymin = padded_polygon[:, 1].min()
- ymax = padded_polygon[:, 1].max()
- width = xmax - xmin + 1
- height = ymax - ymin + 1
- # Get absolute polygon for distance computation
- polygon[:, 0] = polygon[:, 0] - xmin
- polygon[:, 1] = polygon[:, 1] - ymin
- # Get absolute padded polygon
- xs = np.broadcast_to(np.linspace(0, width - 1, num=width).reshape(1, width), (height, width))
- ys = np.broadcast_to(np.linspace(0, height - 1, num=height).reshape(height, 1), (height, width))
-
- # Compute distance map to fill the padded polygon
- distance_map = np.zeros((polygon.shape[0], height, width), dtype=np.float32)
- for i in range(polygon.shape[0]):
- j = (i + 1) % polygon.shape[0]
- absolute_distance = self.compute_distance(xs, ys, polygon[i], polygon[j])
- distance_map[i] = np.clip(absolute_distance / distance, 0, 1)
- distance_map = np.min(distance_map, axis=0)
-
- # Clip the padded polygon inside the canvas
- xmin_valid = min(max(0, xmin), canvas.shape[1] - 1)
- xmax_valid = min(max(0, xmax), canvas.shape[1] - 1)
- ymin_valid = min(max(0, ymin), canvas.shape[0] - 1)
- ymax_valid = min(max(0, ymax), canvas.shape[0] - 1)
-
- # Fill the canvas with the distances computed inside the valid padded polygon
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1] = np.fmax(
- 1 - distance_map[
- ymin_valid - ymin:ymax_valid - ymin + 1,
- xmin_valid - xmin:xmax_valid - xmin + 1
- ],
- canvas[ymin_valid:ymax_valid + 1, xmin_valid:xmax_valid + 1]
- )
-
- return polygon, canvas, mask
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.uint8)
- seg_mask = np.ones(output_shape, dtype=np.bool)
- thresh_target = np.zeros(output_shape, dtype=np.uint8)
- thresh_mask = np.ones(output_shape, dtype=np.uint8)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- polys = np.stack([
- abs_boxes[:, [0, 1]],
- abs_boxes[:, [0, 3]],
- abs_boxes[:, [2, 3]],
- abs_boxes[:, [2, 1]],
- ], axis=1)
-
- for box, box_size, poly, is_ambiguous in zip(abs_boxes, boxes_size, polys, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
-
- # Negative shrink for gt, as described in paper
- polygon = Polygon(poly)
- distance = polygon.area * (1 - np.power(self.shrink_ratio, 2)) / polygon.length
- subject = [tuple(coor) for coor in poly]
- padding = pyclipper.PyclipperOffset()
- padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
- shrinked = padding.Execute(-distance)
-
- # Draw polygon on gt if it is valid
- if len(shrinked) == 0:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- shrinked = np.array(shrinked[0]).reshape(-1, 2)
- if shrinked.shape[0] <= 2 or not Polygon(shrinked).is_valid:
- seg_mask[box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- cv2.fillPoly(seg_target[idx], [shrinked.astype(np.int32)], 1)
-
- # Draw on both thresh map and thresh mask
- poly, thresh_target[idx], thresh_mask[idx] = self.draw_thresh_map(poly, thresh_target[idx],
- thresh_mask[idx])
-
- thresh_target = thresh_target.astype(np.float32) * (self.thresh_max - self.thresh_min) + self.thresh_min
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
- thresh_target = tf.convert_to_tensor(thresh_target, dtype=tf.float32)
- thresh_mask = tf.convert_to_tensor(thresh_mask, dtype=tf.bool)
-
- return seg_target, seg_mask, thresh_target, thresh_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- thresh_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts, masks, thresh_gts, thresh_masks from a list of boxes
- and a list of masks for each image. From there it computes the loss with the model output
-
- Args:
- out_map: output feature map of the model of shape (N, H, W, C)
- thresh_map: threshold map of shape (N, H, W, C)
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
-
- prob_map = tf.math.sigmoid(tf.squeeze(out_map, axis=[-1]))
- thresh_map = tf.math.sigmoid(tf.squeeze(thresh_map, axis=[-1]))
-
- seg_target, seg_mask, thresh_target, thresh_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute balanced BCE loss for proba_map
- bce_scale = 5.
- bce_loss = tf.keras.losses.binary_crossentropy(seg_target[..., None], out_map, from_logits=True)[seg_mask]
-
- neg_target = 1 - seg_target[seg_mask]
- positive_count = tf.math.reduce_sum(seg_target[seg_mask])
- negative_count = tf.math.reduce_min([tf.math.reduce_sum(neg_target), 3. * positive_count])
- negative_loss = bce_loss * neg_target
- negative_loss, _ = tf.nn.top_k(negative_loss, tf.cast(negative_count, tf.int32))
- sum_losses = tf.math.reduce_sum(bce_loss * seg_target[seg_mask]) + tf.math.reduce_sum(negative_loss)
- balanced_bce_loss = sum_losses / (positive_count + negative_count + 1e-6)
-
- # Compute dice loss for approxbin_map
- bin_map = 1 / (1 + tf.exp(-50. * (prob_map[seg_mask] - thresh_map[seg_mask])))
-
- bce_min = tf.math.reduce_min(bce_loss)
- weights = (bce_loss - bce_min) / (tf.math.reduce_max(bce_loss) - bce_min) + 1.
- inter = tf.math.reduce_sum(bin_map * seg_target[seg_mask] * weights)
- union = tf.math.reduce_sum(bin_map) + tf.math.reduce_sum(seg_target[seg_mask]) + 1e-8
- dice_loss = 1 - 2.0 * inter / union
-
- # Compute l1 loss for thresh_map
- l1_scale = 10.
- if tf.reduce_any(thresh_mask):
- l1_loss = tf.math.reduce_mean(tf.math.abs(thresh_map[thresh_mask] - thresh_target[thresh_mask]))
- else:
- l1_loss = tf.constant(0.)
-
- return l1_scale * l1_loss + bce_scale * balanced_bce_loss + dice_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- feat_maps = self.feat_extractor(x, **kwargs)
- feat_concat = self.fpn(feat_maps, **kwargs)
- logits = self.probability_head(feat_concat, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
-
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- thresh_map = self.threshold_head(feat_concat, **kwargs)
- loss = self.compute_loss(logits, thresh_map, target)
- out['loss'] = loss
-
- return out
-
-
-def _db_resnet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> DBNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['fpn_channels'] = kwargs.get('fpn_channels', _cfg['fpn_channels'])
-
- # Feature extractor
- resnet = tf.keras.applications.__dict__[_cfg['backbone']](
- include_top=False,
- weights=None,
- input_shape=_cfg['input_shape'],
- pooling=None,
- )
-
- feat_extractor = IntermediateLayerGetter(
- resnet,
- _cfg['fpn_layers'],
- )
-
- kwargs['fpn_channels'] = _cfg['fpn_channels']
-
- # Build the model
- model = DBNet(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def db_resnet50(pretrained: bool = False, **kwargs: Any) -> DBNet:
- """DBNet as described in `"Real-time Scene Text Detection with Differentiable Binarization"
- <https://arxiv.org/pdf/1911.08947.pdf>`_, using a ResNet-50 backbone.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import db_resnet50
- >>> model = db_resnet50(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _db_resnet('db_resnet50', pretrained, **kwargs)
-
-
Source code for doctr.models.detection.differentiable_binarization.tensorflo
Source code for doctr.models.detection.fast.tensorflow
Source code for doctr.models.detection.linknet
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-# Credits: post-processing adapted from https://github.com/xuannianz/DifferentiableBinarization
-
-from copy import deepcopy
-import tensorflow as tf
-import numpy as np
-import cv2
-from tensorflow.keras import layers, Sequential
-from typing import Dict, Any, Tuple, Optional, List
-
-from .core import DetectionModel, DetectionPostProcessor
-from ..backbones import ResnetStage
-from ..utils import conv_sequence, load_pretrained_params
-from ...utils.repr import NestedObject
-
-__all__ = ['LinkNet', 'linknet', 'LinkNetPostProcessor']
-
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'linknet': {
- 'mean': (0.798, 0.785, 0.772),
- 'std': (0.264, 0.2749, 0.287),
- 'out_chan': 1,
- 'input_shape': (1024, 1024, 3),
- 'post_processor': 'LinkNetPostProcessor',
- 'url': None,
- },
-}
-
-
-class LinkNetPostProcessor(DetectionPostProcessor):
- """Implements a post processor for LinkNet model.
-
- Args:
- min_size_box: minimal length (pix) to keep a box
- box_thresh: minimal objectness score to consider a box
- bin_thresh: threshold used to binzarized p_map at inference time
-
- """
- def __init__(
- self,
- min_size_box: int = 3,
- bin_thresh: float = 0.15,
- box_thresh: float = 0.1,
- ) -> None:
- super().__init__(
- box_thresh,
- bin_thresh
- )
-
- def bitmap_to_boxes(
- self,
- pred: np.ndarray,
- bitmap: np.ndarray,
- ) -> np.ndarray:
- """Compute boxes from a bitmap/pred_map: find connected components then filter boxes
-
- Args:
- pred: Pred map from differentiable linknet output
- bitmap: Bitmap map computed from pred (binarized)
-
- Returns:
- np tensor boxes for the bitmap, each box is a 5-element list
- containing x, y, w, h, score for the box
- """
- label_num, labelimage = cv2.connectedComponents(bitmap.astype(np.uint8), connectivity=4)
- height, width = bitmap.shape[:2]
- min_size_box = 1 + int(height / 512)
- boxes = []
- for label in range(1, label_num + 1):
- points = np.array(np.where(labelimage == label)[::-1]).T
- if points.shape[0] < 4: # remove polygons with 3 points or less
- continue
- score = self.box_score(pred, points.reshape(-1, 2))
- if self.box_thresh > score: # remove polygons with a weak objectness
- continue
- x, y, w, h = cv2.boundingRect(points)
- if min(w, h) < min_size_box: # filter too small boxes
- continue
- # compute relative polygon to get rid of img shape
- xmin, ymin, xmax, ymax = x / width, y / height, (x + w) / width, (y + h) / height
- boxes.append([xmin, ymin, xmax, ymax, score])
- return np.clip(np.asarray(boxes), 0, 1) if len(boxes) > 0 else np.zeros((0, 5), dtype=np.float32)
-
-
-def decoder_block(in_chan: int, out_chan: int) -> Sequential:
- """Creates a LinkNet decoder block"""
-
- return Sequential([
- *conv_sequence(in_chan // 4, 'relu', True, kernel_size=1),
- layers.Conv2DTranspose(
- filters=in_chan // 4,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(out_chan, 'relu', True, kernel_size=1),
- ])
-
-
-class LinkNetFPN(layers.Layer, NestedObject):
- """LinkNet Encoder-Decoder module
-
- """
-
- def __init__(
- self,
- ) -> None:
-
- super().__init__()
- self.encoder_1 = ResnetStage(num_blocks=2, output_channels=64, downsample=True)
- self.encoder_2 = ResnetStage(num_blocks=2, output_channels=128, downsample=True)
- self.encoder_3 = ResnetStage(num_blocks=2, output_channels=256, downsample=True)
- self.encoder_4 = ResnetStage(num_blocks=2, output_channels=512, downsample=True)
- self.decoder_1 = decoder_block(in_chan=64, out_chan=64)
- self.decoder_2 = decoder_block(in_chan=128, out_chan=64)
- self.decoder_3 = decoder_block(in_chan=256, out_chan=128)
- self.decoder_4 = decoder_block(in_chan=512, out_chan=256)
-
- def call(
- self,
- x: tf.Tensor
- ) -> tf.Tensor:
- x_1 = self.encoder_1(x)
- x_2 = self.encoder_2(x_1)
- x_3 = self.encoder_3(x_2)
- x_4 = self.encoder_4(x_3)
- y_4 = self.decoder_4(x_4)
- y_3 = self.decoder_3(y_4 + x_3)
- y_2 = self.decoder_2(y_3 + x_2)
- y_1 = self.decoder_1(y_2 + x_1)
- return y_1
-
-
-class LinkNet(DetectionModel, NestedObject):
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Args:
- out_chan: number of channels for the output
- """
-
- def __init__(
- self,
- out_chan: int = 1,
- input_shape: Tuple[int, int, int] = (512, 512, 3),
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(cfg=cfg)
-
- self.stem = Sequential([
- *conv_sequence(64, 'relu', True, strides=2, kernel_size=7, input_shape=input_shape),
- layers.MaxPool2D(pool_size=(3, 3), strides=2, padding='same'),
- ])
-
- self.fpn = LinkNetFPN()
-
- self.classifier = Sequential([
- layers.Conv2DTranspose(
- filters=32,
- kernel_size=3,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- layers.BatchNormalization(),
- layers.Activation('relu'),
- *conv_sequence(32, 'relu', True, strides=1, kernel_size=3),
- layers.Conv2DTranspose(
- filters=out_chan,
- kernel_size=2,
- strides=2,
- padding="same",
- use_bias=False,
- kernel_initializer='he_normal'
- ),
- ])
-
- self.min_size_box = 3
-
- self.postprocessor = LinkNetPostProcessor()
-
- def compute_target(
- self,
- target: List[Dict[str, Any]],
- output_shape: Tuple[int, int, int],
- ) -> Tuple[tf.Tensor, tf.Tensor]:
-
- seg_target = np.zeros(output_shape, dtype=np.bool)
- seg_mask = np.ones(output_shape, dtype=np.bool)
-
- for idx, _target in enumerate(target):
- # Draw each polygon on gt
- if _target['boxes'].shape[0] == 0:
- # Empty image, full masked
- seg_mask[idx] = False
-
- # Absolute bounding boxes
- abs_boxes = _target['boxes'].copy()
- abs_boxes[:, [0, 2]] *= output_shape[-1]
- abs_boxes[:, [1, 3]] *= output_shape[-2]
- abs_boxes = abs_boxes.round().astype(np.int32)
-
- boxes_size = np.minimum(abs_boxes[:, 2] - abs_boxes[:, 0], abs_boxes[:, 3] - abs_boxes[:, 1])
-
- for box, box_size, is_ambiguous in zip(abs_boxes, boxes_size, _target['flags']):
- # Mask ambiguous boxes
- if is_ambiguous:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Mask boxes that are too small
- if box_size < self.min_size_box:
- seg_mask[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = False
- continue
- # Fill polygon with 1
- seg_target[idx, box[1]: box[3] + 1, box[0]: box[2] + 1] = True
-
- seg_target = tf.convert_to_tensor(seg_target, dtype=tf.float32)
- seg_mask = tf.convert_to_tensor(seg_mask, dtype=tf.bool)
-
- return seg_target, seg_mask
-
- def compute_loss(
- self,
- out_map: tf.Tensor,
- target: List[Dict[str, Any]]
- ) -> tf.Tensor:
- """Compute a batch of gts and masks from a list of boxes and a list of masks for each image
- Then, it computes the loss function with proba_map, gts and masks
-
- Args:
- out_map: output feature map of the model of shape N x H x W x 1
- target: list of dictionary where each dict has a `boxes` and a `flags` entry
-
- Returns:
- A loss tensor
- """
- seg_target, seg_mask = self.compute_target(target, out_map.shape[:3])
-
- # Compute BCE loss
- return tf.math.reduce_mean(tf.keras.losses.binary_crossentropy(
- seg_target[seg_mask],
- tf.squeeze(out_map, axis=[-1])[seg_mask],
- from_logits=True
- ))
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[Dict[str, Any]]] = None,
- return_model_output: bool = False,
- return_boxes: bool = False,
- **kwargs: Any,
- ) -> Dict[str, Any]:
-
- logits = self.stem(x)
- logits = self.fpn(logits)
- logits = self.classifier(logits)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output or target is None or return_boxes:
- prob_map = tf.math.sigmoid(logits)
- if return_model_output:
- out["out_map"] = prob_map
-
- if target is None or return_boxes:
- # Post-process boxes
- out["boxes"] = self.postprocessor(prob_map)
-
- if target is not None:
- loss = self.compute_loss(logits, target)
- out['loss'] = loss
-
- return out
-
-
-def _linknet(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> LinkNet:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['out_chan'] = kwargs.get('out_chan', _cfg['out_chan'])
-
- kwargs['out_chan'] = _cfg['out_chan']
- kwargs['input_shape'] = _cfg['input_shape']
- # Build the model
- model = LinkNet(cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def linknet(pretrained: bool = False, **kwargs: Any) -> LinkNet:
- """LinkNet as described in `"LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation"
- <https://arxiv.org/pdf/1707.03718.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import linknet
- >>> model = linknet(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 1024, 1024, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text detection dataset
-
- Returns:
- text detection architecture
- """
-
- return _linknet('linknet', pretrained, **kwargs)
-
-
Source code for doctr.models.detection.linknet.tensorflow
Source code for doctr.models.detection.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import DetectionPredictor, DetectionPreProcessor
-from .. import detection
+from typing import Any, List
+
+from doctr.file_utils import is_tf_available, is_torch_available
+from .. import detection
+from ..detection.fast import reparameterize
+from ..preprocessor import PreProcessor
+from .predictor import DetectionPredictor
__all__ = ["detection_predictor"]
-ARCHS = ['db_resnet50', 'linknet']
+ARCHS: List[str]
+
+if is_tf_available():
+ ARCHS = [
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
+elif is_torch_available():
+ ARCHS = [
+ "db_resnet34",
+ "db_resnet50",
+ "db_mobilenet_v3_large",
+ "linknet_resnet18",
+ "linknet_resnet34",
+ "linknet_resnet50",
+ "fast_tiny",
+ "fast_small",
+ "fast_base",
+ ]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> DetectionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, assume_straight_pages: bool = True, **kwargs: Any) -> DetectionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- # Detection
- _model = detection.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
+ _model = detection.__dict__[arch](
+ pretrained=pretrained,
+ pretrained_backbone=kwargs.get("pretrained_backbone", True),
+ assume_straight_pages=assume_straight_pages,
+ )
+ # Reparameterize FAST models by default to lower inference latency and memory usage
+ if isinstance(_model, detection.FAST):
+ _model = reparameterize(_model)
+ else:
+ if not isinstance(arch, (detection.DBNet, detection.LinkNet, detection.FAST)):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+
+ _model = arch
+ _model.assume_straight_pages = assume_straight_pages
+ _model.postprocessor.assume_straight_pages = assume_straight_pages
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 2)
predictor = DetectionPredictor(
- DetectionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
+ PreProcessor(_model.cfg["input_shape"][:-1] if is_tf_available() else _model.cfg["input_shape"][1:], **kwargs),
+ _model,
)
return predictor
-[docs]
-def detection_predictor(arch: str = 'db_resnet50', pretrained: bool = False, **kwargs: Any) -> DetectionPredictor:
+[docs]
+def detection_predictor(
+ arch: Any = "fast_base",
+ pretrained: bool = False,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ batch_size: int = 2,
+ **kwargs: Any,
+) -> DetectionPredictor:
"""Text detection architecture.
- Example::
- >>> import numpy as np
- >>> from doctr.models import detection_predictor
- >>> model = detection_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import detection_predictor
+ >>> model = detection_predictor(arch='db_resnet50', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_resnet50')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'db_resnet50')
pretrained: If True, returns a model pre-trained on our text detection dataset
+ assume_straight_pages: If True, fit straight boxes to the page
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional keyword arguments passed to the architecture
Returns:
+ -------
Detection predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(
+ arch=arch,
+ pretrained=pretrained,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ batch_size=batch_size,
+ **kwargs,
+ )
Source code for doctr.models.detection.zoo
Source code for doctr.models.export
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import logging
-import numpy as np
-import tensorflow as tf
-from tensorflow.keras import Model
-from typing import Tuple
-
-logging.getLogger("tensorflow").setLevel(logging.DEBUG)
-
-
-__all__ = ['convert_to_tflite', 'convert_to_fp16', 'quantize_model']
-
-
-
-[docs]
-def convert_to_tflite(tf_model: Model) -> bytes:
- """Converts a model to TFLite format
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_tflite, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_tflite(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
- return converter.convert()
-
-
-
-
-[docs]
-def convert_to_fp16(tf_model: Model) -> bytes:
- """Converts a model to half precision
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import convert_to_fp16, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = convert_to_fp16(model)
-
- Args:
- tf_model: a keras model
-
- Returns:
- bytes: the serialized FP16 model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
- converter.target_spec.supported_types = [tf.float16]
- return converter.convert()
-
-
-
-
-[docs]
-def quantize_model(tf_model: Model, input_shape: Tuple[int, int, int]) -> bytes:
- """Quantize a Tensorflow model
-
- Example::
- >>> from tensorflow.keras import Sequential
- >>> from doctr.models import quantize_model, conv_sequence
- >>> model = Sequential(conv_sequence(32, 'relu', True, kernel_size=3, input_shape=(224, 224, 3)))
- >>> serialized_model = quantize_model(model, (224, 224, 3))
-
- Args:
- tf_model: a keras model
- input_shape: shape of the expected input tensor (excluding batch dimension) with channel last order
-
- Returns:
- bytes: the serialized quantized model
- """
- converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
-
- converter.optimizations = [tf.lite.Optimize.DEFAULT]
-
- # Float fallback for operators that do not have an integer implementation
- def representative_dataset():
- for _ in range(100):
- data = np.random.rand(1, *input_shape)
- yield [data.astype(np.float32)]
-
- converter.representative_dataset = representative_dataset
- converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
- converter.inference_input_type = tf.int8
- converter.inference_output_type = tf.int8
-
- return converter.convert()
-
-
Source code for doctr.models.factory.hub
Source code for doctr.models.recognition.crnn
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import layers
-from tensorflow.keras.models import Sequential
-from typing import Tuple, Dict, Any, Optional, List
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel, RecognitionPostProcessor
-
-__all__ = ['CRNN', 'crnn_vgg16_bn', 'crnn_resnet31', 'CTCPostProcessor']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'crnn_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/crnn_vgg16_bn-748c855f.zip',
- },
- 'crnn_resnet31': {
- 'mean': (0.694, 0.695, 0.693),
- 'std': (0.299, 0.296, 0.301),
- 'backbone': 'resnet31', 'rnn_units': 128,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'CTCPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.1/crnn_resnet31-69ab71db.zip',
- },
-}
-
-
-class CTCPostProcessor(RecognitionPostProcessor):
- """
- Postprocess raw prediction of the model (logits) to a list of words using CTC decoding
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def ctc_decoder(
- self,
- logits: tf.Tensor
- ) -> tf.Tensor:
- """
- Decode logits with CTC decoder from keras backend
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- decoded logits, shape BATCH_SIZE X SEQ_LEN
- """
- # computing prediction with ctc decoder
- _prediction = tf.nn.ctc_greedy_decoder(
- tf.nn.softmax(tf.transpose(logits, perm=[1, 0, 2])),
- tf.fill(logits.shape[0], logits.shape[1]),
- merge_repeated=True
- )[0][0]
- prediction = tf.sparse.to_dense(_prediction, default_value=len(self.vocab))
-
- return prediction
-
- def __call__(
- self,
- logits: tf.Tensor
- ) -> List[str]:
- """
- Performs decoding of raw output with CTC and decoding of CTC predictions
- with label_to_idx mapping dictionnary
-
- Args:
- logits: raw output of the model, shape BATCH_SIZE X SEQ_LEN X NUM_CLASSES + 1
-
- Returns:
- A list of decoded words of length BATCH_SIZE
-
- """
- # decode ctc for ctc models
- predictions = self.ctc_decoder(logits)
-
- _decoded_strings_pred = tf.strings.reduce_join(
- inputs=tf.nn.embedding_lookup(self._embedding, predictions),
- axis=-1
- )
- _decoded_strings_pred = tf.strings.split(_decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(_decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-class CRNN(RecognitionModel):
- """Implements a CRNN architecture as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of units in the LSTM layers
- cfg: configuration dictionary
- """
- def __init__(
- self,
- feature_extractor: tf.keras.Model,
- vocab: str,
- rnn_units: int = 128,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
- super().__init__(vocab=vocab, cfg=cfg)
- self.feat_extractor = feature_extractor
-
- # Initialize kernels
- h, w, c = self.feat_extractor.output_shape[1:]
- self.max_length = w
-
- self.decoder = Sequential(
- [
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Bidirectional(layers.LSTM(units=rnn_units, return_sequences=True)),
- layers.Dense(units=len(vocab) + 1)
- ]
- )
- self.decoder.build(input_shape=(None, w, h * c))
-
- self.postprocessor = CTCPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- target: List[str],
- ) -> tf.Tensor:
- """Compute CTC loss for the model.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- gt, seq_len = self.compute_target(target)
- batch_len = model_output.shape[0]
- input_length = model_output.shape[1] * tf.ones(shape=(batch_len))
- ctc_loss = tf.nn.ctc_loss(
- gt, model_output, seq_len, input_length, logits_time_major=False, blank_index=len(self.vocab)
- )
- return ctc_loss
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- # B x H x W x C --> B x W x H x C
- transposed_feat = tf.transpose(features, perm=[0, 2, 1, 3])
- w, h, c = transposed_feat.get_shape().as_list()[1:]
- # B x W x H x C --> B x W x H * C
- features_seq = tf.reshape(transposed_feat, shape=(-1, w, h * c))
- decoded_features = self.decoder(features_seq, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, target)
-
- return out
-
-
-def _crnn(arch: str, pretrained: bool, input_shape: Optional[Tuple[int, int, int]] = None, **kwargs: Any) -> CRNN:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[_cfg['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
-
- # Build the model
- model = CRNN(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, _cfg['url'])
-
- return model
-
-
-
-[docs]
-def crnn_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a VGG-16 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_vgg16_bn
- >>> model = crnn_vgg16_bn(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_vgg16_bn', pretrained, **kwargs)
-
-
-
-def crnn_resnet31(pretrained: bool = False, **kwargs: Any) -> CRNN:
- """CRNN with a resnet31 backbone as described in `"An End-to-End Trainable Neural Network for Image-based
- Sequence Recognition and Its Application to Scene Text Recognition" <https://arxiv.org/pdf/1507.05717.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import crnn_resnet31
- >>> model = crnn_resnet31(pretrained=True)
- >>> input_tensor = tf.random.uniform(shape=[1, 32, 128, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _crnn('crnn_resnet31', pretrained, **kwargs)
-
Source code for doctr.models.recognition.crnn.tensorflow
Source code for doctr.models.recognition.master.tensorflow
Source code for doctr.models.recognition.parseq.tensorflow
Source code for doctr.models.recognition.sar
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-from copy import deepcopy
-import tensorflow as tf
-from tensorflow.keras import Sequential, layers
-from typing import Tuple, Dict, List, Any, Optional
-
-from .. import backbones
-from ..utils import load_pretrained_params
-from .core import RecognitionModel
-from .core import RecognitionPostProcessor
-from doctr.utils.repr import NestedObject
-
-__all__ = ['SAR', 'SARPostProcessor', 'sar_vgg16_bn', 'sar_resnet31']
-
-default_cfgs: Dict[str, Dict[str, Any]] = {
- 'sar_vgg16_bn': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'vgg16_bn', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1-models/sar_vgg16bn-0d7e2c26.zip',
- },
- 'sar_resnet31': {
- 'mean': (.5, .5, .5),
- 'std': (1., 1., 1.),
- 'backbone': 'resnet31', 'rnn_units': 512, 'max_length': 30, 'num_decoders': 2,
- 'input_shape': (32, 128, 3),
- 'post_processor': 'SARPostProcessor',
- 'vocab': ('3K}7eé;5àÎYho]QwV6qU~W"XnbBvcADfËmy.9ÔpÛ*{CôïE%M4#ÈR:g@T$x?0î£|za1ù8,OG€P-'
- 'kçHëÀÂ2É/ûIJ\'j(LNÙFut[)èZs+&°Sd=Ï!<â_Ç>rêi`l'),
- 'url': 'https://github.com/mindee/doctr/releases/download/v0.1.0/sar_resnet31-ea202587.zip',
- },
-}
-
-
-class AttentionModule(layers.Layer, NestedObject):
- """Implements attention module of the SAR model
-
- Args:
- attention_units: number of hidden attention units
-
- """
- def __init__(
- self,
- attention_units: int
- ) -> None:
-
- super().__init__()
- self.hidden_state_projector = layers.Conv2D(
- attention_units, 1, strides=1, use_bias=False, padding='same', kernel_initializer='he_normal',
- )
- self.features_projector = layers.Conv2D(
- attention_units, 3, strides=1, use_bias=True, padding='same', kernel_initializer='he_normal',
- )
- self.attention_projector = layers.Conv2D(
- 1, 1, strides=1, use_bias=False, padding="same", kernel_initializer='he_normal',
- )
- self.flatten = layers.Flatten()
-
- def call(
- self,
- features: tf.Tensor,
- hidden_state: tf.Tensor,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- [H, W] = features.get_shape().as_list()[1:3]
- # shape (N, 1, 1, rnn_units) -> (N, 1, 1, attention_units)
- hidden_state_projection = self.hidden_state_projector(hidden_state, **kwargs)
- # shape (N, H, W, vgg_units) -> (N, H, W, attention_units)
- features_projection = self.features_projector(features, **kwargs)
- projection = tf.math.tanh(hidden_state_projection + features_projection)
- # shape (N, H, W, attention_units) -> (N, H, W, 1)
- attention = self.attention_projector(projection, **kwargs)
- # shape (N, H, W, 1) -> (N, H * W)
- attention = self.flatten(attention)
- attention = tf.nn.softmax(attention)
- # shape (N, H * W) -> (N, H, W, 1)
- attention_map = tf.reshape(attention, [-1, H, W, 1])
- glimpse = tf.math.multiply(features, attention_map)
- # shape (N, H * W) -> (N, 1)
- glimpse = tf.reduce_sum(glimpse, axis=[1, 2])
- return glimpse
-
-
-class SARDecoder(layers.Layer, NestedObject):
- """Implements decoder module of the SAR model
-
- Args:
- rnn_units: number of hidden units in recurrent cells
- max_length: maximum length of a sequence
- vocab_size: number of classes in the model alphabet
- embedding_units: number of hidden embedding units
- attention_units: number of hidden attention units
- num_decoder_layers: number of LSTM layers to stack
-
- """
- def __init__(
- self,
- rnn_units: int,
- max_length: int,
- vocab_size: int,
- embedding_units: int,
- attention_units: int,
- num_decoder_layers: int = 2,
- input_shape: Optional[List[Tuple[Optional[int]]]] = None,
- ) -> None:
-
- super().__init__()
- self.vocab_size = vocab_size
- self.lstm_decoder = layers.StackedRNNCells(
- [layers.LSTMCell(rnn_units, dtype=tf.float32, implementation=1) for _ in range(num_decoder_layers)]
- )
- self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))
- self.attention_module = AttentionModule(attention_units)
- self.output_dense = layers.Dense(vocab_size + 1, use_bias=True, input_shape=(None, 2 * rnn_units))
- self.max_length = max_length
-
- # Initialize kernels
- if input_shape is not None:
- self.attention_module.call(layers.Input(input_shape[0][1:]), layers.Input((1, 1, rnn_units)))
-
- def call(
- self,
- features: tf.Tensor,
- holistic: tf.Tensor,
- gt: Optional[tf.Tensor] = None,
- **kwargs: Any,
- ) -> tf.Tensor:
-
- # initialize states (each of shape (N, rnn_units))
- states = self.lstm_decoder.get_initial_state(
- inputs=None, batch_size=features.shape[0], dtype=tf.float32
- )
- # run first step of lstm
- # holistic: shape (N, rnn_units)
- _, states = self.lstm_decoder(holistic, states, **kwargs)
- # Initialize with the index of virtual START symbol (placed after <eos>)
- symbol = tf.fill(features.shape[0], self.vocab_size + 1)
- logits_list = []
- if kwargs.get('training') and gt is None:
- raise ValueError('Need to provide labels during training for teacher forcing')
- for t in range(self.max_length + 1): # keep 1 step for <eos>
- # one-hot symbol with depth vocab_size + 1
- # embeded_symbol: shape (N, embedding_units)
- embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
- logits, states = self.lstm_decoder(embeded_symbol, states, **kwargs)
- glimpse = self.attention_module(
- features, tf.expand_dims(tf.expand_dims(logits, axis=1), axis=1), **kwargs,
- )
- # logits: shape (N, rnn_units), glimpse: shape (N, 1)
- logits = tf.concat([logits, glimpse], axis=-1)
- # shape (N, rnn_units + 1) -> (N, vocab_size + 1)
- logits = self.output_dense(logits, **kwargs)
- # update symbol with predicted logits for t+1 step
- if kwargs.get('training'):
- symbol = gt[:, t]
- else:
- symbol = tf.argmax(logits, axis=-1)
- logits_list.append(logits)
- outputs = tf.stack(logits_list, axis=1) # shape (N, max_length + 1, vocab_size + 1)
-
- return outputs
-
-
-class SAR(RecognitionModel):
- """Implements a SAR architecture as described in `"Show, Attend and Read:A Simple and Strong Baseline for
- Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Args:
- feature_extractor: the backbone serving as feature extractor
- vocab: vocabulary used for encoding
- rnn_units: number of hidden units in both encoder and decoder LSTM
- embedding_units: number of embedding units
- attention_units: number of hidden units in attention module
- max_length: maximum word length handled by the model
- num_decoders: number of LSTM to stack in decoder layer
-
- """
-
- _children_names: List[str] = ['feat_extractor', 'encoder', 'decoder']
-
- def __init__(
- self,
- feature_extractor,
- vocab: str,
- rnn_units: int = 512,
- embedding_units: int = 512,
- attention_units: int = 512,
- max_length: int = 30,
- num_decoders: int = 2,
- cfg: Optional[Dict[str, Any]] = None,
- ) -> None:
-
- super().__init__(vocab=vocab, cfg=cfg)
-
- self.max_length = max_length + 1 # Add 1 timestep for EOS after the longest word
-
- self.feat_extractor = feature_extractor
-
- self.encoder = Sequential(
- [
- layers.LSTM(units=rnn_units, return_sequences=True),
- layers.LSTM(units=rnn_units, return_sequences=False)
- ]
- )
- # Initialize the kernels (watch out for reduce_max)
- self.encoder.build(input_shape=(None,) + self.feat_extractor.output_shape[2:])
-
- self.decoder = SARDecoder(
- rnn_units, max_length, len(vocab), embedding_units, attention_units, num_decoders,
- input_shape=[self.feat_extractor.output_shape, self.encoder.output_shape]
- )
-
- self.postprocessor = SARPostProcessor(vocab=vocab)
-
- def compute_loss(
- self,
- model_output: tf.Tensor,
- gt: tf.Tensor,
- seq_len: tf.Tensor,
- ) -> tf.Tensor:
- """Compute categorical cross-entropy loss for the model.
- Sequences are masked after the EOS character.
-
- Args:
- gt: the encoded tensor with gt labels
- model_output: predicted logits of the model
- seq_len: lengths of each gt word inside the batch
-
- Returns:
- The loss of the model on the batch
- """
- # Input length : number of timesteps
- input_len = tf.shape(model_output)[1]
- # Add one for additional <eos> token
- seq_len = seq_len + 1
- # One-hot gt labels
- oh_gt = tf.one_hot(gt, depth=model_output.shape[2])
- # Compute loss
- cce = tf.nn.softmax_cross_entropy_with_logits(oh_gt, model_output)
- # Compute mask
- mask_values = tf.zeros_like(cce)
- mask_2d = tf.sequence_mask(seq_len, input_len)
- masked_loss = tf.where(mask_2d, cce, mask_values)
- ce_loss = tf.math.divide(tf.reduce_sum(masked_loss, axis=1), tf.cast(seq_len, tf.float32))
- return tf.expand_dims(ce_loss, axis=1)
-
- def call(
- self,
- x: tf.Tensor,
- target: Optional[List[str]] = None,
- return_model_output: bool = False,
- return_preds: bool = False,
- **kwargs: Any,
- ) -> Dict[str, tf.Tensor]:
-
- features = self.feat_extractor(x, **kwargs)
- pooled_features = tf.reduce_max(features, axis=1) # vertical max pooling
- encoded = self.encoder(pooled_features, **kwargs)
- if target is not None:
- gt, seq_len = self.compute_target(target)
- decoded_features = self.decoder(features, encoded, gt=None if target is None else gt, **kwargs)
-
- out: Dict[str, tf.Tensor] = {}
- if return_model_output:
- out["out_map"] = decoded_features
-
- if target is None or return_preds:
- # Post-process boxes
- out["preds"] = self.postprocessor(decoded_features)
-
- if target is not None:
- out['loss'] = self.compute_loss(decoded_features, gt, seq_len)
-
- return out
-
-
-class SARPostProcessor(RecognitionPostProcessor):
- """Post processor for SAR architectures
-
- Args:
- vocab: string containing the ordered sequence of supported characters
- ignore_case: if True, ignore case of letters
- ignore_accents: if True, ignore accents of letters
- """
-
- def __call__(
- self,
- logits: tf.Tensor,
- ) -> List[str]:
- # compute pred with argmax for attention models
- pred = tf.math.argmax(logits, axis=2)
-
- # decode raw output of the model with tf_label_to_idx
- pred = tf.cast(pred, dtype='int32')
- decoded_strings_pred = tf.strings.reduce_join(inputs=tf.nn.embedding_lookup(self._embedding, pred), axis=-1)
- decoded_strings_pred = tf.strings.split(decoded_strings_pred, "<eos>")
- decoded_strings_pred = tf.sparse.to_dense(decoded_strings_pred.to_sparse(), default_value='not valid')[:, 0]
- words_list = [word.decode() for word in list(decoded_strings_pred.numpy())]
-
- if self.ignore_case:
- words_list = [word.lower() for word in words_list]
-
- if self.ignore_accents:
- raise NotImplementedError
-
- return words_list
-
-
-def _sar(arch: str, pretrained: bool, input_shape: Tuple[int, int, int] = None, **kwargs: Any) -> SAR:
-
- # Patch the config
- _cfg = deepcopy(default_cfgs[arch])
- _cfg['input_shape'] = input_shape or _cfg['input_shape']
- _cfg['vocab'] = kwargs.get('vocab', _cfg['vocab'])
- _cfg['rnn_units'] = kwargs.get('rnn_units', _cfg['rnn_units'])
- _cfg['embedding_units'] = kwargs.get('embedding_units', _cfg['rnn_units'])
- _cfg['attention_units'] = kwargs.get('attention_units', _cfg['rnn_units'])
- _cfg['max_length'] = kwargs.get('max_length', _cfg['max_length'])
- _cfg['num_decoders'] = kwargs.get('num_decoders', _cfg['num_decoders'])
-
- # Feature extractor
- feat_extractor = backbones.__dict__[default_cfgs[arch]['backbone']](
- input_shape=_cfg['input_shape'],
- include_top=False,
- )
-
- kwargs['vocab'] = _cfg['vocab']
- kwargs['rnn_units'] = _cfg['rnn_units']
- kwargs['embedding_units'] = _cfg['embedding_units']
- kwargs['attention_units'] = _cfg['attention_units']
- kwargs['max_length'] = _cfg['max_length']
- kwargs['num_decoders'] = _cfg['num_decoders']
-
- # Build the model
- model = SAR(feat_extractor, cfg=_cfg, **kwargs)
- # Load pretrained parameters
- if pretrained:
- load_pretrained_params(model, default_cfgs[arch]['url'])
-
- return model
-
-
-
-[docs]
-def sar_vgg16_bn(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a VGG16 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example::
- >>> import tensorflow as tf
- >>> from doctr.models import sar_vgg16_bn
- >>> model = sar_vgg16_bn(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_vgg16_bn', pretrained, **kwargs)
-
-
-
-
-[docs]
-def sar_resnet31(pretrained: bool = False, **kwargs: Any) -> SAR:
- """SAR with a resnet-31 feature extractor as described in `"Show, Attend and Read:A Simple and Strong
- Baseline for Irregular Text Recognition" <https://arxiv.org/pdf/1811.00751.pdf>`_.
-
- Example:
- >>> import tensorflow as tf
- >>> from doctr.models import sar_resnet31
- >>> model = sar_resnet31(pretrained=False)
- >>> input_tensor = tf.random.uniform(shape=[1, 64, 256, 3], maxval=1, dtype=tf.float32)
- >>> out = model(input_tensor)
-
- Args:
- pretrained (bool): If True, returns a model pre-trained on our text recognition dataset
-
- Returns:
- text recognition architecture
- """
-
- return _sar('sar_resnet31', pretrained, **kwargs)
-
-
Source code for doctr.models.recognition.sar.tensorflow
Source code for doctr.models.recognition.vitstr.tensorflow
Source code for doctr.models.recognition.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-from typing import Dict, Any
-from .core import RecognitionPredictor, RecognitionPreProcessor
-from .. import recognition
+from typing import Any, List
+from doctr.file_utils import is_tf_available
+from doctr.models.preprocessor import PreProcessor
+
+from .. import recognition
+from .predictor import RecognitionPredictor
__all__ = ["recognition_predictor"]
-ARCHS = ['crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31']
+ARCHS: List[str] = [
+ "crnn_vgg16_bn",
+ "crnn_mobilenet_v3_small",
+ "crnn_mobilenet_v3_large",
+ "sar_resnet31",
+ "master",
+ "vitstr_small",
+ "vitstr_base",
+ "parseq",
+]
-def _predictor(arch: str, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
- if arch not in ARCHS:
- raise ValueError(f"unknown architecture '{arch}'")
+def _predictor(arch: Any, pretrained: bool, **kwargs: Any) -> RecognitionPredictor:
+ if isinstance(arch, str):
+ if arch not in ARCHS:
+ raise ValueError(f"unknown architecture '{arch}'")
- _model = recognition.__dict__[arch](pretrained=pretrained)
- kwargs['mean'] = kwargs.get('mean', _model.cfg['mean'])
- kwargs['std'] = kwargs.get('std', _model.cfg['std'])
- predictor = RecognitionPredictor(
- RecognitionPreProcessor(output_size=_model.cfg['input_shape'][:2], **kwargs),
- _model
- )
+ _model = recognition.__dict__[arch](
+ pretrained=pretrained, pretrained_backbone=kwargs.get("pretrained_backbone", True)
+ )
+ else:
+ if not isinstance(
+ arch, (recognition.CRNN, recognition.SAR, recognition.MASTER, recognition.ViTSTR, recognition.PARSeq)
+ ):
+ raise ValueError(f"unknown architecture: {type(arch)}")
+ _model = arch
+
+ kwargs.pop("pretrained_backbone", None)
+
+ kwargs["mean"] = kwargs.get("mean", _model.cfg["mean"])
+ kwargs["std"] = kwargs.get("std", _model.cfg["std"])
+ kwargs["batch_size"] = kwargs.get("batch_size", 128)
+ input_shape = _model.cfg["input_shape"][:2] if is_tf_available() else _model.cfg["input_shape"][-2:]
+ predictor = RecognitionPredictor(PreProcessor(input_shape, preserve_aspect_ratio=True, **kwargs), _model)
return predictor
-[docs]
-def recognition_predictor(arch: str = 'crnn_vgg16_bn', pretrained: bool = False, **kwargs: Any) -> RecognitionPredictor:
+[docs]
+def recognition_predictor(
+ arch: Any = "crnn_vgg16_bn",
+ pretrained: bool = False,
+ symmetric_pad: bool = False,
+ batch_size: int = 128,
+ **kwargs: Any,
+) -> RecognitionPredictor:
"""Text recognition architecture.
Example::
@@ -313,14 +369,18 @@ Source code for doctr.models.recognition.zoo
>>> out = model([input_page])
Args:
- arch: name of the architecture to use ('crnn_vgg16_bn', 'crnn_resnet31', 'sar_vgg16_bn', 'sar_resnet31')
+ ----
+ arch: name of the architecture or model itself to use (e.g. 'crnn_vgg16_bn')
pretrained: If True, returns a model pre-trained on our text recognition dataset
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right
+ batch_size: number of samples the model processes in parallel
+ **kwargs: optional parameters to be passed to the architecture
Returns:
+ -------
Recognition predictor
"""
-
- return _predictor(arch, pretrained, **kwargs)
+ return _predictor(arch=arch, pretrained=pretrained, symmetric_pad=symmetric_pad, batch_size=batch_size, **kwargs)
Source code for doctr.models.recognition.zoo
Source code for doctr.models.zoo
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
from typing import Any
-from .core import OCRPredictor
+
from .detection.zoo import detection_predictor
+from .kie_predictor import KIEPredictor
+from .predictor import OCRPredictor
from .recognition.zoo import recognition_predictor
+__all__ = ["ocr_predictor", "kie_predictor"]
-__all__ = ["ocr_predictor"]
-
-
-def _predictor(det_arch: str, reco_arch: str, pretrained: bool, det_bs=2, reco_bs=128) -> OCRPredictor:
+def _predictor(
+ det_arch: Any,
+ reco_arch: Any,
+ pretrained: bool,
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ det_bs: int = 2,
+ reco_bs: int = 128,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs,
+) -> OCRPredictor:
# Detection
- det_predictor = detection_predictor(det_arch, pretrained=pretrained, batch_size=det_bs)
+ det_predictor = detection_predictor(
+ det_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=det_bs,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ )
# Recognition
- reco_predictor = recognition_predictor(reco_arch, pretrained=pretrained, batch_size=reco_bs)
+ reco_predictor = recognition_predictor(
+ reco_arch,
+ pretrained=pretrained,
+ pretrained_backbone=pretrained_backbone,
+ batch_size=reco_bs,
+ )
- return OCRPredictor(det_predictor, reco_predictor)
+ return OCRPredictor(
+ det_predictor,
+ reco_predictor,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
-[docs]
+[docs]
def ocr_predictor(
- det_arch: str = 'db_resnet50',
- reco_arch: str = 'crnn_vgg16_bn',
+ det_arch: Any = "fast_base",
+ reco_arch: Any = "crnn_vgg16_bn",
pretrained: bool = False,
- **kwargs: Any
+ pretrained_backbone: bool = True,
+ assume_straight_pages: bool = True,
+ preserve_aspect_ratio: bool = True,
+ symmetric_pad: bool = True,
+ export_as_straight_boxes: bool = False,
+ detect_orientation: bool = False,
+ straighten_pages: bool = False,
+ detect_language: bool = False,
+ **kwargs: Any,
) -> OCRPredictor:
"""End-to-end OCR architecture using one model for localization, and another for text recognition.
- Example::
- >>> import numpy as np
- >>> from doctr.models import ocr_predictor
- >>> model = ocr_predictor(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([input_page])
+ >>> import numpy as np
+ >>> from doctr.models import ocr_predictor
+ >>> model = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([input_page])
Args:
- arch: name of the architecture to use ('db_sar_vgg', 'db_sar_resnet', 'db_crnn_vgg', 'db_crnn_resnet')
+ ----
+ det_arch: name of the detection architecture or the model itself to use
+ (e.g. 'db_resnet50', 'db_mobilenet_v3_large')
+ reco_arch: name of the recognition architecture or the model itself to use
+ (e.g. 'crnn_vgg16_bn', 'sar_resnet31')
pretrained: If True, returns a model pre-trained on our OCR dataset
+ pretrained_backbone: If True, returns a model with a pretrained backbone
+ assume_straight_pages: if True, speeds up the inference by assuming you only pass straight pages
+ without rotated textual elements.
+ preserve_aspect_ratio: If True, pad the input document image to preserve the aspect ratio before
+ running the detection model on it.
+ symmetric_pad: if True, pad the image symmetrically instead of padding at the bottom-right.
+ export_as_straight_boxes: when assume_straight_pages is set to False, export final predictions
+ (potentially rotated) as straight bounding boxes.
+ detect_orientation: if True, the estimated general page orientation will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ straighten_pages: if True, estimates the page general orientation
+ based on the segmentation map median line orientation.
+ Then, rotates page before passing it again to the deep learning detection module.
+ Doing so will improve performances for documents with page-uniform rotations.
+ detect_language: if True, the language prediction will be added to the predictions for each
+ page. Doing so will slightly deteriorate the overall latency.
+ kwargs: keyword args of `OCRPredictor`
Returns:
+ -------
OCR predictor
"""
+ return _predictor(
+ det_arch,
+ reco_arch,
+ pretrained,
+ pretrained_backbone=pretrained_backbone,
+ assume_straight_pages=assume_straight_pages,
+ preserve_aspect_ratio=preserve_aspect_ratio,
+ symmetric_pad=symmetric_pad,
+ export_as_straight_boxes=export_as_straight_boxes,
+ detect_orientation=detect_orientation,
+ straighten_pages=straighten_pages,
+ detect_language=detect_language,
+ **kwargs,
+ )
+
+
- return _predictor(det_arch, reco_arch, pretrained, **kwargs)
Source code for doctr.models.zoo
Source code for doctr.transforms.modules
-# Copyright (C) 2021, Mindee.
-
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
-
-import random
-import tensorflow as tf
-from typing import List, Any, Tuple, Callable
-
-from doctr.utils.repr import NestedObject
-from . import functional as F
-
-
-__all__ = ['Compose', 'Resize', 'Normalize', 'LambdaTransformation', 'ToGray', 'ColorInversion',
- 'RandomBrightness', 'RandomContrast', 'RandomSaturation', 'RandomHue', 'RandomGamma', 'RandomJpegQuality',
- 'OneOf', 'RandomApply']
-
-
-
-[docs]
-class Compose(NestedObject):
- """Implements a wrapper that will apply transformations sequentially
-
- Example::
- >>> from doctr.transforms import Compose, Resize
- >>> import tensorflow as tf
- >>> transfos = Compose([Resize((32, 32))])
- >>> out = transfos(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformation modules
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, x: Any) -> Any:
- for t in self.transforms:
- x = t(x)
-
- return x
-
-
-
-
-[docs]
-class Resize(NestedObject):
- """Resizes a tensor to a target size
-
- Example::
- >>> from doctr.transforms import Resize
- >>> import tensorflow as tf
- >>> transfo = Resize((32, 32))
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- output_size: expected output size
- method: interpolation method
- preserve_aspect_ratio: if `True`, preserve aspect ratio and pad the rest with zeros
- """
- def __init__(
- self,
- output_size: Tuple[int, int],
- method: str = 'bilinear',
- preserve_aspect_ratio: bool = False,
- ) -> None:
- self.output_size = output_size
- self.method = method
- self.preserve_aspect_ratio = preserve_aspect_ratio
-
- def extra_repr(self) -> str:
- return f"output_size={self.output_size}, method='{self.method}'"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img = tf.image.resize(img, self.output_size, self.method, self.preserve_aspect_ratio)
- if self.preserve_aspect_ratio:
- img = tf.image.pad_to_bounding_box(img, 0, 0, *self.output_size)
- return img
-
-
-
-
-[docs]
-class Normalize(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- mean: average value per channel
- std: standard deviation per channel
- """
- def __init__(self, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
- self.mean = tf.constant(mean, dtype=tf.float32)
- self.std = tf.constant(std, dtype=tf.float32)
-
- def extra_repr(self) -> str:
- return f"mean={self.mean.numpy().tolist()}, std={self.std.numpy().tolist()}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- img -= self.mean
- img /= self.std
- return img
-
-
-
-
-[docs]
-class LambdaTransformation(NestedObject):
- """Normalize a tensor to a Gaussian distribution for each channel
-
- Example::
- >>> from doctr.transforms import LambdaTransformation
- >>> import tensorflow as tf
- >>> transfo = LambdaTransformation(lambda x: x/ 255.)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- fn: the function to be applied to the input tensor
- """
- def __init__(self, fn: Callable[[tf.Tensor], tf.Tensor]) -> None:
- self.fn = fn
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return self.fn(img)
-
-
-
-
-[docs]
-class ToGray(NestedObject):
- """Convert a RGB tensor (batch of images or image) to a 3-channels grayscale tensor
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ToGray()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
- """
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.rgb_to_grayscale(img)
-
-
-
-
-[docs]
-class ColorInversion(NestedObject):
- """Applies the following tranformation to a tensor (image or batch of images):
- convert to grayscale, colorize (shift 0-values randomly), and then invert colors
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = ColorInversion(min_val=0.6)
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_val: range [min_val, 1] to colorize RGB pixels
- """
- def __init__(self, min_val: float = 0.6) -> None:
- self.min_val = min_val
-
- def extra_repr(self) -> str:
- return f"min_val={self.min_val}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return F.invert_colors(img, self.min_val)
-
-
-
-
-[docs]
-class RandomBrightness(NestedObject):
- """Randomly adjust brightness of a tensor (batch of images or image) by adding a delta
- to all pixels
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Brightness()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- p: probability to apply transformation
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_brightness(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomContrast(NestedObject):
- """Randomly adjust contrast of a tensor (batch of images or image) by adjusting
- each pixel: (img - mean) * contrast_factor + mean.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Contrast()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce contrast if factor<1)
- """
- def __init__(self, delta: float = .3) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_contrast(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomSaturation(NestedObject):
- """Randomly adjust saturation of a tensor (batch of images or image) by converting to HSV and
- increasing saturation by a factor.
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Saturation()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- delta: multiplicative factor is picked in [1-delta, 1+delta] (reduce saturation if factor<1)
- """
- def __init__(self, delta: float = .5) -> None:
- self.delta = delta
-
- def extra_repr(self) -> str:
- return f"delta={self.delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_saturation(img, lower=1 - self.delta, upper=1 / (1 - self.delta))
-
-
-
-
-[docs]
-class RandomHue(NestedObject):
- """Randomly adjust hue of a tensor (batch of images or image) by converting to HSV and adding a delta
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Hue()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- max_delta: offset to add to each pixel is randomly picked in [-max_delta, max_delta]
- """
- def __init__(self, max_delta: float = 0.3) -> None:
- self.max_delta = max_delta
-
- def extra_repr(self) -> str:
- return f"max_delta={self.max_delta}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_hue(img, max_delta=self.max_delta)
-
-
-
-
-[docs]
-class RandomGamma(NestedObject):
- """randomly performs gamma correction for a tensor (batch of images or image)
-
- Example:
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = Gamma()
- >>> out = transfo(tf.random.uniform(shape=[8, 64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_gamma: non-negative real number, lower bound for gamma param
- max_gamma: non-negative real number, upper bound for gamma
- min_gain: lower bound for constant multiplier
- max_gain: upper bound for constant multiplier
- """
- def __init__(
- self,
- min_gamma: float = 0.5,
- max_gamma: float = 1.5,
- min_gain: float = 0.8,
- max_gain: float = 1.2,
- ) -> None:
- self.min_gamma = min_gamma
- self.max_gamma = max_gamma
- self.min_gain = min_gain
- self.max_gain = max_gain
-
- def extra_repr(self) -> str:
- return f"""gamma_range=({self.min_gamma}, {self.max_gamma}),
- gain_range=({self.min_gain}, {self.max_gain})"""
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- gamma = random.uniform(self.min_gamma, self.max_gamma)
- gain = random.uniform(self.min_gain, self.max_gain)
- return tf.image.adjust_gamma(img, gamma=gamma, gain=gain)
-
-
-
-
-[docs]
-class RandomJpegQuality(NestedObject):
- """Randomly adjust jpeg quality of a 3 dimensional RGB image
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = JpegQuality()
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- min_quality: int between [0, 100]
- max_quality: int between [0, 100]
- """
- def __init__(self, min_quality: int = 60, max_quality: int = 100) -> None:
- self.min_quality = min_quality
- self.max_quality = max_quality
-
- def extra_repr(self) -> str:
- return f"min_quality={self.min_quality}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- return tf.image.random_jpeg_quality(
- img, min_jpeg_quality=self.min_quality, max_jpeg_quality=self.max_quality
- )
-
-
-
-
-[docs]
-class OneOf(NestedObject):
- """Randomly apply one of the input transformations
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = OneOf([JpegQuality(), Gamma()])
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transforms: list of transformations, one only will be picked
- """
-
- _children_names: List[str] = ['transforms']
-
- def __init__(self, transforms: List[NestedObject]) -> None:
- self.transforms = transforms
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- # Pick transformation
- transfo = self.transforms[int(random.random() * len(self.transforms))]
- # Apply
- return transfo(img)
-
-
-
-
-[docs]
-class RandomApply(NestedObject):
- """Apply with a probability p the input transformation
-
- Example::
- >>> from doctr.transforms import Normalize
- >>> import tensorflow as tf
- >>> transfo = RandomApply(Gamma(), p=.5)
- >>> out = transfo(tf.random.uniform(shape=[64, 64, 3], minval=0, maxval=1))
-
- Args:
- transform: transformation to apply
- p: probability to apply
- """
- def __init__(self, transform: NestedObject, p: float = .5) -> None:
- self.transform = transform
- self.p = p
-
- def extra_repr(self) -> str:
- return f"transform={self.transform}, p={self.p}"
-
- def __call__(self, img: tf.Tensor) -> tf.Tensor:
- if random.random() < self.p:
- return self.transform(img)
- return img
-
-
Source code for doctr.transforms.modules.base
Source code for doctr.transforms.modules.tensorflow
Source code for doctr.utils.metrics
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
+
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+from typing import Dict, List, Optional, Tuple
import numpy as np
-from rapidfuzz.string_metric import levenshtein
-from typing import List, Tuple
+from anyascii import anyascii
from scipy.optimize import linear_sum_assignment
+from shapely.geometry import Polygon
-__all__ = ['ExactMatch', 'box_iou', 'assign_pairs', 'LocalizationConfusion', 'OCRMetric']
+__all__ = [
+ "TextMatch",
+ "box_iou",
+ "polygon_iou",
+ "nms",
+ "LocalizationConfusion",
+ "OCRMetric",
+ "DetectionMetric",
+]
-
-[docs]
-class ExactMatch:
- """Implements exact match metric (word-level accuracy) for recognition task.
+def string_match(word1: str, word2: str) -> Tuple[bool, bool, bool, bool]:
+ """Performs string comparison with multiple levels of tolerance
- The aggregated metric is computed as follows:
+ Args:
+ ----
+ word1: a string
+ word2: another string
- .. math::
- \\forall X, Y \\in \\mathcal{W}^N,
- ExactMatch(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N f_{Y_i}(X_i)
+ Returns:
+ -------
+ a tuple with booleans specifying respectively whether the raw strings, their lower-case counterparts, their
+ anyascii counterparts and their lower-case anyascii counterparts match
+ """
+ raw_match = word1 == word2
+ caseless_match = word1.lower() == word2.lower()
+ anyascii_match = anyascii(word1) == anyascii(word2)
- with the indicator function :math:`f_{a}` defined as:
+ # Warning: the order is important here otherwise the pair ("EUR", "€") cannot be matched
+ unicase_match = anyascii(word1).lower() == anyascii(word2).lower()
- .. math::
- \\forall a, x \\in \\mathcal{W},
- f_a(x) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } x = a \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{W}` is the set of all possible character sequences,
- :math:`N` is a strictly positive integer.
+ return raw_match, caseless_match, anyascii_match, unicase_match
- Example::
- >>> from doctr.utils import ExactMatch
- >>> metric = ExactMatch()
- >>> metric.update(['Hello', 'world'], ['hello', 'world'])
- >>> metric.summary()
- Args:
- ignore_case: if true, ignore letter case when computing metric
- ignore_accents: if true, ignore accents errors when computing metrics"""
+
+[docs]
+class TextMatch:
+ r"""Implements text match metric (word-level accuracy) for recognition task.
- def __init__(
- self,
- ignore_case: bool = False,
- ignore_accents: bool = False,
- ) -> None:
+ The raw aggregated metric is computed as follows:
- self.matches = 0
- self.total = 0
- self.ignore_case = ignore_case
- self.ignore_accents = ignore_accents
+ .. math::
+ \forall X, Y \in \mathcal{W}^N,
+ TextMatch(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N f_{Y_i}(X_i)
- @staticmethod
- def remove_accent(input_string: str) -> str:
- """Removes all accents (¨^çéè...) from input_string
+ with the indicator function :math:`f_{a}` defined as:
- Args:
- input_string: character sequence with accents
+ .. math::
+ \forall a, x \in \mathcal{W},
+ f_a(x) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } x = a \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{W}` is the set of all possible character sequences,
+ :math:`N` is a strictly positive integer.
- Returns:
- character sequence without accents"""
+ >>> from doctr.utils import TextMatch
+ >>> metric = TextMatch()
+ >>> metric.update(['Hello', 'world'], ['hello', 'world'])
+ >>> metric.summary()
+ """
- raise NotImplementedError
+ def __init__(self) -> None:
+ self.reset()
+
+[docs]
def update(
self,
gt: List[str],
@@ -348,53 +386,66 @@ Source code for doctr.utils.metrics
"""Update the state of the metric with new predictions
Args:
+ ----
gt: list of groung-truth character sequences
- pred: list of predicted character sequences"""
-
+ pred: list of predicted character sequences
+ """
if len(gt) != len(pred):
raise AssertionError("prediction size does not match with ground-truth labels size")
- for pred_word, gt_word in zip(pred, gt):
- if self.ignore_accents:
- gt_word = self.remove_accent(gt_word)
- pred_word = self.remove_accent(pred_word)
-
- if self.ignore_case:
- gt_word = gt_word.lower()
- pred_word = pred_word.lower()
+ for gt_word, pred_word in zip(gt, pred):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_word, pred_word)
+ self.raw += int(_raw)
+ self.caseless += int(_caseless)
+ self.anyascii += int(_anyascii)
+ self.unicase += int(_unicase)
- if pred_word == gt_word:
- self.matches += 1
+ self.total += len(gt)
- self.total += len(gt)
- def summary(self) -> float:
- """Computes the aggregated evaluation
+
+[docs]
+ def summary(self) -> Dict[str, float]:
+ """Computes the aggregated metrics
- Returns:
- metric result"""
+ Returns
+ -------
+ a dictionary with the exact match score for the raw data, its lower-case counterpart, its anyascii
+ counterpart and its lower-case anyascii counterpart
+ """
if self.total == 0:
raise AssertionError("you need to update the metric before getting the summary")
- return self.matches / self.total
+
+ return dict(
+ raw=self.raw / self.total,
+ caseless=self.caseless / self.total,
+ anyascii=self.anyascii / self.total,
+ unicase=self.unicase / self.total,
+ )
+
def reset(self) -> None:
- self.matches = 0
+ self.raw = 0
+ self.caseless = 0
+ self.anyascii = 0
+ self.unicase = 0
self.total = 0
def box_iou(boxes_1: np.ndarray, boxes_2: np.ndarray) -> np.ndarray:
- """Compute the IoU between two sets of bounding boxes
+ """Computes the IoU between two sets of bounding boxes
Args:
+ ----
boxes_1: bounding boxes of shape (N, 4) in format (xmin, ymin, xmax, ymax)
boxes_2: bounding boxes of shape (M, 4) in format (xmin, ymin, xmax, ymax)
Returns:
+ -------
the IoU matrix of shape (N, M)
"""
-
- iou_mat = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
+ iou_mat: np.ndarray = np.zeros((boxes_1.shape[0], boxes_2.shape[0]), dtype=np.float32)
if boxes_1.shape[0] > 0 and boxes_2.shape[0] > 0:
l1, t1, r1, b1 = np.split(boxes_1, 4, axis=1)
@@ -405,169 +456,244 @@ Source code for doctr.utils.metrics
right = np.minimum(r1, r2.T)
bot = np.minimum(b1, b2.T)
- intersection = np.clip(right - left, 0, np.Inf) * np.clip(bot - top, 0, np.Inf)
+ intersection = np.clip(right - left, 0, np.inf) * np.clip(bot - top, 0, np.inf)
union = (r1 - l1) * (b1 - t1) + ((r2 - l2) * (b2 - t2)).T - intersection
iou_mat = intersection / union
return iou_mat
-def assign_pairs(score_mat: np.ndarray, score_threshold: float = 0.5) -> Tuple[np.ndarray, np.ndarray]:
- """Assigns candidates by maximizing the score of all pairs
+def polygon_iou(polys_1: np.ndarray, polys_2: np.ndarray) -> np.ndarray:
+ """Computes the IoU between two sets of rotated bounding boxes
Args:
- score_mat: score matrix
- score_threshold: minimum score to validate an assignment
+ ----
+ polys_1: rotated bounding boxes of shape (N, 4, 2)
+ polys_2: rotated bounding boxes of shape (M, 4, 2)
+ mask_shape: spatial shape of the intermediate masks
+ use_broadcasting: if set to True, leverage broadcasting speedup by consuming more memory
+
Returns:
- a tuple of two lists: the list of assigned row candidates indices, and the list of their column counterparts
+ -------
+ the IoU matrix of shape (N, M)
"""
+ if polys_1.ndim != 3 or polys_2.ndim != 3:
+ raise AssertionError("expects boxes to be in format (N, 4, 2)")
+
+ iou_mat = np.zeros((polys_1.shape[0], polys_2.shape[0]), dtype=np.float32)
+
+ shapely_polys_1 = [Polygon(poly) for poly in polys_1]
+ shapely_polys_2 = [Polygon(poly) for poly in polys_2]
+
+ for i, poly1 in enumerate(shapely_polys_1):
+ for j, poly2 in enumerate(shapely_polys_2):
+ intersection_area = poly1.intersection(poly2).area
+ union_area = poly1.area + poly2.area - intersection_area
+ iou_mat[i, j] = intersection_area / union_area
- row_ind, col_ind = linear_sum_assignment(-score_mat)
- is_kept = score_mat[row_ind, col_ind] >= score_threshold
- return row_ind[is_kept], col_ind[is_kept]
+ return iou_mat
+
+
+def nms(boxes: np.ndarray, thresh: float = 0.5) -> List[int]:
+ """Perform non-max suppression, borrowed from <https://github.com/rbgirshick/fast-rcnn>`_.
+
+ Args:
+ ----
+ boxes: np array of straight boxes: (*, 5), (xmin, ymin, xmax, ymax, score)
+ thresh: iou threshold to perform box suppression.
+
+ Returns:
+ -------
+ A list of box indexes to keep
+ """
+ x1 = boxes[:, 0]
+ y1 = boxes[:, 1]
+ x2 = boxes[:, 2]
+ y2 = boxes[:, 3]
+ scores = boxes[:, 4]
+
+ areas = (x2 - x1) * (y2 - y1)
+ order = scores.argsort()[::-1]
+
+ keep = []
+ while order.size > 0:
+ i = order[0]
+ keep.append(i)
+ xx1 = np.maximum(x1[i], x1[order[1:]])
+ yy1 = np.maximum(y1[i], y1[order[1:]])
+ xx2 = np.minimum(x2[i], x2[order[1:]])
+ yy2 = np.minimum(y2[i], y2[order[1:]])
+
+ w = np.maximum(0.0, xx2 - xx1)
+ h = np.maximum(0.0, yy2 - yy1)
+ inter = w * h
+ ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+ inds = np.where(ovr <= thresh)[0]
+ order = order[inds + 1]
+ return keep
-[docs]
+[docs]
class LocalizationConfusion:
- """Implements common confusion metrics and mean IoU for localization evaluation.
+ r"""Implements common confusion metrics and mean IoU for localization evaluation.
The aggregated metrics are computed as follows:
.. math::
- \\forall Y \\in \\mathcal{B}^N, \\forall X \\in \\mathcal{B}^M, \\\\
- Recall(X, Y) = \\frac{1}{N} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- Precision(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^N g_{X}(Y_i) \\\\
- meanIoU(X, Y) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(X_i, Y_j)
+ \forall Y \in \mathcal{B}^N, \forall X \in \mathcal{B}^M, \\
+ Recall(X, Y) = \frac{1}{N} \sum\limits_{i=1}^N g_{X}(Y_i) \\
+ Precision(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M g_{X}(Y_i) \\
+ meanIoU(X, Y) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(X_i, Y_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`g_{X}` defined as:
.. math::
- \\forall y \\in \\mathcal{B},
- g_X(y) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } y\\mbox{ has been assigned to any }(X_i)_i\\mbox{ with an }IoU \\geq 0.5 \\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
+ \forall y \in \mathcal{B},
+ g_X(y) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } y\mbox{ has been assigned to any }(X_i)_i\mbox{ with an }IoU \geq 0.5 \\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import LocalizationConfusion
- >>> metric = LocalizationConfusion(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import LocalizationConfusion
+ >>> metric = LocalizationConfusion(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]))
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
- def __init__(self, iou_thresh: float = 0.5) -> None:
-
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
self.iou_thresh = iou_thresh
- self.num_gts = 0
- self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(self, gts: np.ndarray, preds: np.ndarray) -> None:
+ """Updates the metric
+ Args:
+ ----
+ gts: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ preds: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ """
if preds.shape[0] > 0:
# Compute IoU
- iou_mat = box_iou(gts, preds)
- self.tot_iou += float(iou_mat.max(axis=1).sum())
+ if self.use_polygons:
+ iou_mat = polygon_iou(gts, preds)
+ else:
+ iou_mat = box_iou(gts, preds)
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
# Assign pairs
- gt_indices, _ = assign_pairs(iou_mat, self.iou_thresh)
- self.num_matches += len(gt_indices)
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ self.matches += int((iou_mat[gt_indices, pred_indices] >= self.iou_thresh).sum())
# Update counts
self.num_gts += gts.shape[0]
- self.num_preds += preds.shape[0]
+ self.num_preds += preds.shape[0]
- def summary(self) -> Tuple[float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall, precision and meanIoU scores
+ """
# Recall
- recall = self.num_matches / self.num_gts
+ recall = self.matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_matches / self.num_preds
+ precision = self.matches / self.num_preds if self.num_preds > 0 else None
# mean IoU
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
- return recall, precision, mean_iou
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_matches = 0
- self.tot_iou = 0.
+ self.matches = 0
+ self.tot_iou = 0.0
-[docs]
+[docs]
class OCRMetric:
- """Implements end-to-end OCR metric.
+ r"""Implements an end-to-end OCR metric.
The aggregated metrics are computed as follows:
.. math::
- \\forall (B, L) \\in \\mathcal{B}^N \\times \\mathcal{L}^N,
- \\forall (\\hat{B}, \\hat{L}) \\in \\mathcal{B}^M \\times \\mathcal{L}^M, \\\\
- Recall(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{N} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- Precision(B, \\hat{B}, L, \\hat{L}) = \\frac{1}{M} \\sum\\limits_{i=1}^N h_{B,L}(\\hat{B}_i, \\hat{L}_i) \\\\
- meanIoU(B, \\hat{B}) = \\frac{1}{M} \\sum\\limits_{i=1}^M \\max\\limits_{j \\in [1, N]} IoU(\\hat{B}_i, B_j)
+ \forall (B, L) \in \mathcal{B}^N \times \mathcal{L}^N,
+ \forall (\hat{B}, \hat{L}) \in \mathcal{B}^M \times \mathcal{L}^M, \\
+ Recall(B, \hat{B}, L, \hat{L}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ Precision(B, \hat{B}, L, \hat{L}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,L}(\hat{B}_i, \hat{L}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
:math:`y`, and the function :math:`h_{B, L}` defined as:
.. math::
- \\forall (b, l) \\in \\mathcal{B} \\times \\mathcal{L},
- h_{B,L}(b, l) = \\left\\{
- \\begin{array}{ll}
- 1 & \\mbox{if } b\\mbox{ has been assigned to a given }B_j\\mbox{ with an } \\\\
- & IoU \\geq 0.5 \\mbox{ and that for this assignment, } l = L_j\\\\
- 0 & \\mbox{otherwise.}
- \\end{array}
- \\right.
-
- where :math:`\\mathcal{B}` is the set of possible bounding boxes,
- :math:`\\mathcal{L}` is the set of possible character sequences,
+ \forall (b, l) \in \mathcal{B} \times \mathcal{L},
+ h_{B,L}(b, l) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } l = L_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{L}` is the set of possible character sequences,
:math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
- Example::
- >>> import numpy as np
- >>> from doctr.utils import OCRMetric
- >>> metric = OCRMetric(iou_thresh=0.5)
- >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
- ['hello'], ['hello', 'world'])
- >>> metric.summary()
+ >>> import numpy as np
+ >>> from doctr.utils import OCRMetric
+ >>> metric = OCRMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> ['hello'], ['hello', 'world'])
+ >>> metric.summary()
Args:
+ ----
iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
- max_dist: maximum Levenshtein distance between 2 sequence to consider a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
"""
def __init__(
self,
iou_thresh: float = 0.5,
- max_dist: int = 0
+ use_polygons: bool = False,
) -> None:
-
self.iou_thresh = iou_thresh
- self.max_dist = max_dist
- self.num_gts = 0
- self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.use_polygons = use_polygons
+ self.reset()
+
+[docs]
def update(
self,
gt_boxes: np.ndarray,
@@ -575,52 +701,207 @@ Source code for doctr.utils.metrics
gt_labels: List[str],
pred_labels: List[str],
) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: a list of N string labels
+ pred_labels: a list of M string labels
+ """
+ if gt_boxes.shape[0] != len(gt_labels) or pred_boxes.shape[0] != len(pred_labels):
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
# Compute IoU
- iou_mat = box_iou(gt_boxes, pred_boxes)
- if iou_mat.shape[1] == 0:
- self.tot_iou = 0
- else:
- self.tot_iou += float(iou_mat.max(axis=1).sum())
-
- # Assign pairs
- gt_indices, preds_indices = assign_pairs(iou_mat, self.iou_thresh)
-
- # Compare sequences
- for gt_idx, pred_idx in zip(gt_indices, preds_indices):
- dist = levenshtein(gt_labels[gt_idx], pred_labels[pred_idx])
- self.tot_dist += dist
- if dist <= self.max_dist:
- self.num_reco_matches += 1
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # String comparison
+ for gt_idx, pred_idx in zip(gt_indices[is_kept], pred_indices[is_kept]):
+ _raw, _caseless, _anyascii, _unicase = string_match(gt_labels[gt_idx], pred_labels[pred_idx])
+ self.raw_matches += int(_raw)
+ self.caseless_matches += int(_caseless)
+ self.anyascii_matches += int(_anyascii)
+ self.unicase_matches += int(_unicase)
+
+ self.num_gts += gt_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
+
+
+[docs]
+ def summary(self) -> Tuple[Dict[str, Optional[float]], Dict[str, Optional[float]], Optional[float]]:
+ """Computes the aggregated metrics
+
+ Returns
+ -------
+ a tuple with the recall & precision for each string comparison and the mean IoU
+ """
+ # Recall
+ recall = dict(
+ raw=self.raw_matches / self.num_gts if self.num_gts > 0 else None,
+ caseless=self.caseless_matches / self.num_gts if self.num_gts > 0 else None,
+ anyascii=self.anyascii_matches / self.num_gts if self.num_gts > 0 else None,
+ unicase=self.unicase_matches / self.num_gts if self.num_gts > 0 else None,
+ )
+
+ # Precision
+ precision = dict(
+ raw=self.raw_matches / self.num_preds if self.num_preds > 0 else None,
+ caseless=self.caseless_matches / self.num_preds if self.num_preds > 0 else None,
+ anyascii=self.anyascii_matches / self.num_preds if self.num_preds > 0 else None,
+ unicase=self.unicase_matches / self.num_preds if self.num_preds > 0 else None,
+ )
+
+ # mean IoU (overall detected boxes)
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
+
+ return recall, precision, mean_iou
+
+
+ def reset(self) -> None:
+ self.num_gts = 0
+ self.num_preds = 0
+ self.tot_iou = 0.0
+ self.raw_matches = 0
+ self.caseless_matches = 0
+ self.anyascii_matches = 0
+ self.unicase_matches = 0
+
+
+
+
+[docs]
+class DetectionMetric:
+ r"""Implements an object detection metric.
+
+ The aggregated metrics are computed as follows:
+
+ .. math::
+ \forall (B, C) \in \mathcal{B}^N \times \mathcal{C}^N,
+ \forall (\hat{B}, \hat{C}) \in \mathcal{B}^M \times \mathcal{C}^M, \\
+ Recall(B, \hat{B}, C, \hat{C}) = \frac{1}{N} \sum\limits_{i=1}^N h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ Precision(B, \hat{B}, C, \hat{C}) = \frac{1}{M} \sum\limits_{i=1}^M h_{B,C}(\hat{B}_i, \hat{C}_i) \\
+ meanIoU(B, \hat{B}) = \frac{1}{M} \sum\limits_{i=1}^M \max\limits_{j \in [1, N]} IoU(\hat{B}_i, B_j)
+
+ with the function :math:`IoU(x, y)` being the Intersection over Union between bounding boxes :math:`x` and
+ :math:`y`, and the function :math:`h_{B, C}` defined as:
+
+ .. math::
+ \forall (b, c) \in \mathcal{B} \times \mathcal{C},
+ h_{B,C}(b, c) = \left\{
+ \begin{array}{ll}
+ 1 & \mbox{if } b\mbox{ has been assigned to a given }B_j\mbox{ with an } \\
+ & IoU \geq 0.5 \mbox{ and that for this assignment, } c = C_j\\
+ 0 & \mbox{otherwise.}
+ \end{array}
+ \right.
+
+ where :math:`\mathcal{B}` is the set of possible bounding boxes,
+ :math:`\mathcal{C}` is the set of possible class indices,
+ :math:`N` (number of ground truths) and :math:`M` (number of predictions) are strictly positive integers.
+
+ >>> import numpy as np
+ >>> from doctr.utils import DetectionMetric
+ >>> metric = DetectionMetric(iou_thresh=0.5)
+ >>> metric.update(np.asarray([[0, 0, 100, 100]]), np.asarray([[0, 0, 70, 70], [110, 95, 200, 150]]),
+ >>> np.zeros(1, dtype=np.int64), np.array([0, 1], dtype=np.int64))
+ >>> metric.summary()
+
+ Args:
+ ----
+ iou_thresh: minimum IoU to consider a pair of prediction and ground truth as a match
+ use_polygons: if set to True, predictions and targets will be expected to have rotated format
+ """
+
+ def __init__(
+ self,
+ iou_thresh: float = 0.5,
+ use_polygons: bool = False,
+ ) -> None:
+ self.iou_thresh = iou_thresh
+ self.use_polygons = use_polygons
+ self.reset()
+
+
+[docs]
+ def update(
+ self,
+ gt_boxes: np.ndarray,
+ pred_boxes: np.ndarray,
+ gt_labels: np.ndarray,
+ pred_labels: np.ndarray,
+ ) -> None:
+ """Updates the metric
+
+ Args:
+ ----
+ gt_boxes: a set of relative bounding boxes either of shape (N, 4) or (N, 5) if they are rotated ones
+ pred_boxes: a set of relative bounding boxes either of shape (M, 4) or (M, 5) if they are rotated ones
+ gt_labels: an array of class indices of shape (N,)
+ pred_labels: an array of class indices of shape (M,)
+ """
+ if gt_boxes.shape[0] != gt_labels.shape[0] or pred_boxes.shape[0] != pred_labels.shape[0]:
+ raise AssertionError(
+ "there should be the same number of boxes and string both for the ground truth and the predictions"
+ )
+
+ # Compute IoU
+ if pred_boxes.shape[0] > 0:
+ if self.use_polygons:
+ iou_mat = polygon_iou(gt_boxes, pred_boxes)
+ else:
+ iou_mat = box_iou(gt_boxes, pred_boxes)
+
+ self.tot_iou += float(iou_mat.max(axis=0).sum())
+
+ # Assign pairs
+ gt_indices, pred_indices = linear_sum_assignment(-iou_mat)
+ is_kept = iou_mat[gt_indices, pred_indices] >= self.iou_thresh
+ # Category comparison
+ self.num_matches += int((gt_labels[gt_indices[is_kept]] == pred_labels[pred_indices[is_kept]]).sum())
- # Update counts
- self.num_det_matches = len(gt_indices)
self.num_gts += gt_boxes.shape[0]
- self.num_preds += pred_boxes.shape[0]
+ self.num_preds += pred_boxes.shape[0]
+
- def summary(self) -> Tuple[float, float, float, float]:
+
+[docs]
+ def summary(self) -> Tuple[Optional[float], Optional[float], Optional[float]]:
+ """Computes the aggregated metrics
+ Returns
+ -------
+ a tuple with the recall & precision for each class prediction and the mean IoU
+ """
# Recall
- recall = self.num_reco_matches / self.num_gts
+ recall = self.num_matches / self.num_gts if self.num_gts > 0 else None
# Precision
- precision = self.num_reco_matches / self.num_preds
+ precision = self.num_matches / self.num_preds if self.num_preds > 0 else None
# mean IoU (overall detected boxes)
- mean_iou = self.tot_iou / self.num_preds
+ mean_iou = round(self.tot_iou / self.num_preds, 2) if self.num_preds > 0 else None
- # mean distance (overall detection-matching boxes)
- mean_distance = self.tot_dist / self.num_det_matches
+ return recall, precision, mean_iou
- return recall, precision, mean_iou, mean_distance
def reset(self) -> None:
self.num_gts = 0
self.num_preds = 0
- self.num_det_matches = 0
- self.num_reco_matches = 0
- self.tot_iou = 0.
- self.tot_dist = 0
+ self.tot_iou = 0.0
+ self.num_matches = 0
@@ -654,8 +935,8 @@ Source code for doctr.utils.metrics
-
-
+
Source code for doctr.utils.visualization
-# Copyright (C) 2021, Mindee.
+# Copyright (C) 2021-2024, Mindee.
-# This program is licensed under the Apache License version 2.
-# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.
+# This program is licensed under the Apache License 2.0.
+# See LICENSE or go to <https://opensource.org/licenses/Apache-2.0> for full license details.
+import colorsys
+from copy import deepcopy
+from typing import Any, Dict, List, Optional, Tuple, Union
-import matplotlib.pyplot as plt
-from matplotlib.figure import Figure
+import cv2
import matplotlib.patches as patches
-import mplcursors
+import matplotlib.pyplot as plt
import numpy as np
-from typing import Tuple, List, Dict, Any
+from matplotlib.figure import Figure
-from .common_types import BoundingBox
+from .common_types import BoundingBox, Polygon4P
-__all__ = ['visualize_page']
+__all__ = ["visualize_page", "visualize_kie_page", "draw_boxes"]
-def create_rect_patch(
+def rect_patch(
geometry: BoundingBox,
- label: str,
page_dimensions: Tuple[int, int],
- color: Tuple[int, int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
alpha: float = 0.3,
linewidth: int = 2,
fill: bool = True,
-) -> patches.Patch:
- """Create a matplotlib patch (rectangle) bounding the element
+ preserve_aspect_ratio: bool = False,
+) -> patches.Rectangle:
+ """Create a matplotlib rectangular patch for the element
Args:
+ ----
geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
label: label to display when hovered
- page_dimensions: dimensions of the Page
color: color to draw box
alpha: opacity parameter to fill the boxes, 0 = transparent
linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
Returns:
+ -------
a rectangular Patch
"""
- h, w = page_dimensions
+ if len(geometry) != 2 or any(not isinstance(elt, tuple) or len(elt) != 2 for elt in geometry):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
(xmin, ymin), (xmax, ymax) = geometry
- xmin, xmax = xmin * w, xmax * w
- ymin, ymax = ymin * h, ymax * h
- rect = patches.Rectangle(
+ # Switch to absolute coords
+ if preserve_aspect_ratio:
+ width = height = max(height, width)
+ xmin, w = xmin * width, (xmax - xmin) * width
+ ymin, h = ymin * height, (ymax - ymin) * height
+
+ return patches.Rectangle(
(xmin, ymin),
- xmax - xmin,
- ymax - ymin,
+ w,
+ h,
+ fill=fill,
+ linewidth=linewidth,
+ edgecolor=(*color, alpha),
+ facecolor=(*color, alpha),
+ label=label,
+ )
+
+
+def polygon_patch(
+ geometry: np.ndarray,
+ page_dimensions: Tuple[int, int],
+ label: Optional[str] = None,
+ color: Tuple[float, float, float] = (0, 0, 0),
+ alpha: float = 0.3,
+ linewidth: int = 2,
+ fill: bool = True,
+ preserve_aspect_ratio: bool = False,
+) -> patches.Polygon:
+ """Create a matplotlib polygon patch for the element
+
+ Args:
+ ----
+ geometry: bounding box of the element
+ page_dimensions: dimensions of the Page in format (height, width)
+ label: label to display when hovered
+ color: color to draw box
+ alpha: opacity parameter to fill the boxes, 0 = transparent
+ linewidth: line width
+ fill: whether the patch should be filled
+ preserve_aspect_ratio: pass True if you passed True to the predictor
+
+ Returns:
+ -------
+ a polygon Patch
+ """
+ if not geometry.shape == (4, 2):
+ raise ValueError("invalid geometry format")
+
+ # Unpack
+ height, width = page_dimensions
+ geometry[:, 0] = geometry[:, 0] * (max(width, height) if preserve_aspect_ratio else width)
+ geometry[:, 1] = geometry[:, 1] * (max(width, height) if preserve_aspect_ratio else height)
+
+ return patches.Polygon(
+ geometry,
fill=fill,
linewidth=linewidth,
edgecolor=(*color, alpha),
facecolor=(*color, alpha),
- label=label
+ label=label,
)
- return rect
+
+
+def create_obj_patch(
+ geometry: Union[BoundingBox, Polygon4P, np.ndarray],
+ page_dimensions: Tuple[int, int],
+ **kwargs: Any,
+) -> patches.Patch:
+ """Create a matplotlib patch for the element
+
+ Args:
+ ----
+ geometry: bounding box (straight or rotated) of the element
+ page_dimensions: dimensions of the page in format (height, width)
+ **kwargs: keyword arguments for the patch
+
+ Returns:
+ -------
+ a matplotlib Patch
+ """
+ if isinstance(geometry, tuple):
+ if len(geometry) == 2: # straight word BB (2 pts)
+ return rect_patch(geometry, page_dimensions, **kwargs)
+ elif len(geometry) == 4: # rotated word BB (4 pts)
+ return polygon_patch(np.asarray(geometry), page_dimensions, **kwargs)
+ elif isinstance(geometry, np.ndarray) and geometry.shape == (4, 2): # rotated line
+ return polygon_patch(geometry, page_dimensions, **kwargs)
+ raise ValueError("invalid geometry format")
+
+
+def get_colors(num_colors: int) -> List[Tuple[float, float, float]]:
+ """Generate num_colors color for matplotlib
+
+ Args:
+ ----
+ num_colors: number of colors to generate
+
+ Returns:
+ -------
+ colors: list of generated colors
+ """
+ colors = []
+ for i in np.arange(0.0, 360.0, 360.0 / num_colors):
+ hue = i / 360.0
+ lightness = (50 + np.random.rand() * 10) / 100.0
+ saturation = (90 + np.random.rand() * 10) / 100.0
+ colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))
+ return colors
-[docs]
+[docs]
def visualize_page(
page: Dict[str, Any],
image: np.ndarray,
words_only: bool = True,
+ display_artefacts: bool = True,
scale: float = 10,
interactive: bool = True,
add_labels: bool = True,
@@ -338,22 +472,30 @@ Source code for doctr.utils.visualization
) -> Figure:
"""Visualize a full page with predicted blocks, lines and words
- Example::
- >>> import numpy as np
- >>> import matplotlib.pyplot as plt
- >>> from doctr.utils.visualization import visualize_page
- >>> from doctr.models import ocr_db_crnn
- >>> model = ocr_db_crnn(pretrained=True)
- >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
- >>> out = model([[input_page]])
- >>> visualize_page(out[0].pages[0].export(), input_page)
- >>> plt.show()
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
Args:
+ ----
page: the exported Page of a Document
image: np array of the page, needs to have the same shape than page['dimensions']
words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
"""
# Get proper scale and aspect ratio
h, w = image.shape[:2]
@@ -362,58 +504,189 @@ Source code for doctr.utils.visualization
# Display the image
ax.imshow(image)
# hide both axis
- ax.axis('off')
+ ax.axis("off")
if interactive:
artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
- for block in page['blocks']:
+ for block in page["blocks"]:
if not words_only:
- rect = create_rect_patch(block['geometry'], 'block', page['dimensions'], (0, 1, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ block["geometry"], page["dimensions"], label="block", color=(0, 1, 0), linewidth=1, **kwargs
+ )
# add patch on figure
ax.add_patch(rect)
if interactive:
# add patch to cursor's artists
artists.append(rect)
- for line in block['lines']:
+ for line in block["lines"]:
if not words_only:
- rect = create_rect_patch(line['geometry'], 'line', page['dimensions'], (1, 0, 0), linewidth=1, **kwargs)
+ rect = create_obj_patch(
+ line["geometry"], page["dimensions"], label="line", color=(1, 0, 0), linewidth=1, **kwargs
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
- for word in line['words']:
- rect = create_rect_patch(word['geometry'], f"{word['value']} (confidence: {word['confidence']:.2%})",
- page['dimensions'], (0, 0, 1), **kwargs)
+ for word in line["words"]:
+ rect = create_obj_patch(
+ word["geometry"],
+ page["dimensions"],
+ label=f"{word['value']} (confidence: {word['confidence']:.2%})",
+ color=(0, 0, 1),
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
elif add_labels:
- ax.text(
- int(page['dimensions'][1] * word['geometry'][0][0]),
- int(page['dimensions'][0] * word['geometry'][0][1]),
- word['value'],
- size=10,
- alpha=0.5,
- color=(0, 0, 1),
- )
+ if len(word["geometry"]) == 5:
+ text_loc = (
+ int(page["dimensions"][1] * (word["geometry"][0] - word["geometry"][2] / 2)),
+ int(page["dimensions"][0] * (word["geometry"][1] - word["geometry"][3] / 2)),
+ )
+ else:
+ text_loc = (
+ int(page["dimensions"][1] * word["geometry"][0][0]),
+ int(page["dimensions"][0] * word["geometry"][0][1]),
+ )
- if not words_only:
- for artefact in block['artefacts']:
- rect = create_rect_patch(artefact['geometry'], 'artefact', page['dimensions'], (0.5, 0.5, 0.5),
- linewidth=1, **kwargs)
+ if len(word["geometry"]) == 2:
+ # We draw only if boxes are in straight format
+ ax.text(
+ *text_loc,
+ word["value"],
+ size=10,
+ alpha=0.5,
+ color=(0, 0, 1),
+ )
+
+ if display_artefacts:
+ for artefact in block["artefacts"]:
+ rect = create_obj_patch(
+ artefact["geometry"],
+ page["dimensions"],
+ label="artefact",
+ color=(0.5, 0.5, 0.5),
+ linewidth=1,
+ **kwargs,
+ )
ax.add_patch(rect)
if interactive:
artists.append(rect)
if interactive:
+ import mplcursors
+
# Create mlp Cursor to hover patches in artists
mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
- fig.tight_layout()
+ fig.tight_layout(pad=0.0)
return fig
+
+
+def visualize_kie_page(
+ page: Dict[str, Any],
+ image: np.ndarray,
+ words_only: bool = False,
+ display_artefacts: bool = True,
+ scale: float = 10,
+ interactive: bool = True,
+ add_labels: bool = True,
+ **kwargs: Any,
+) -> Figure:
+ """Visualize a full page with predicted blocks, lines and words
+
+ >>> import numpy as np
+ >>> import matplotlib.pyplot as plt
+ >>> from doctr.utils.visualization import visualize_page
+ >>> from doctr.models import ocr_db_crnn
+ >>> model = ocr_db_crnn(pretrained=True)
+ >>> input_page = (255 * np.random.rand(600, 800, 3)).astype(np.uint8)
+ >>> out = model([[input_page]])
+ >>> visualize_kie_page(out[0].pages[0].export(), input_page)
+ >>> plt.show()
+
+ Args:
+ ----
+ page: the exported Page of a Document
+ image: np array of the page, needs to have the same shape than page['dimensions']
+ words_only: whether only words should be displayed
+ display_artefacts: whether artefacts should be displayed
+ scale: figsize of the largest windows side
+ interactive: whether the plot should be interactive
+ add_labels: for static plot, adds text labels on top of bounding box
+ **kwargs: keyword arguments for the polygon patch
+
+ Returns:
+ -------
+ the matplotlib figure
+ """
+ # Get proper scale and aspect ratio
+ h, w = image.shape[:2]
+ size = (scale * w / h, scale) if h > w else (scale, h / w * scale)
+ fig, ax = plt.subplots(figsize=size)
+ # Display the image
+ ax.imshow(image)
+ # hide both axis
+ ax.axis("off")
+
+ if interactive:
+ artists: List[patches.Patch] = [] # instantiate an empty list of patches (to be drawn on the page)
+
+ colors = {k: color for color, k in zip(get_colors(len(page["predictions"])), page["predictions"])}
+ for key, value in page["predictions"].items():
+ for prediction in value:
+ if not words_only:
+ rect = create_obj_patch(
+ prediction["geometry"],
+ page["dimensions"],
+ label=f"{key} \n {prediction['value']} (confidence: {prediction['confidence']:.2%}",
+ color=colors[key],
+ linewidth=1,
+ **kwargs,
+ )
+ # add patch on figure
+ ax.add_patch(rect)
+ if interactive:
+ # add patch to cursor's artists
+ artists.append(rect)
+
+ if interactive:
+ import mplcursors
+
+ # Create mlp Cursor to hover patches in artists
+ mplcursors.Cursor(artists, hover=2).connect("add", lambda sel: sel.annotation.set_text(sel.artist.get_label()))
+ fig.tight_layout(pad=0.0)
+
+ return fig
+
+
+def draw_boxes(boxes: np.ndarray, image: np.ndarray, color: Optional[Tuple[int, int, int]] = None, **kwargs) -> None:
+ """Draw an array of relative straight boxes on an image
+
+ Args:
+ ----
+ boxes: array of relative boxes, of shape (*, 4)
+ image: np array, float32 or uint8
+ color: color to use for bounding box edges
+ **kwargs: keyword arguments from `matplotlib.pyplot.plot`
+ """
+ h, w = image.shape[:2]
+ # Convert boxes to absolute coords
+ _boxes = deepcopy(boxes)
+ _boxes[:, [0, 2]] *= w
+ _boxes[:, [1, 3]] *= h
+ _boxes = _boxes.astype(np.int32)
+ for box in _boxes.tolist():
+ xmin, ymin, xmax, ymax = box
+ image = cv2.rectangle(
+ image, (xmin, ymin), (xmax, ymax), color=color if isinstance(color, tuple) else (0, 0, 255), thickness=2
+ )
+ plt.imshow(image)
+ plt.plot(**kwargs)
@@ -446,8 +719,8 @@ Source code for doctr.utils.visualization
-
v0.1.0 (2021-03-05)
-
+
diff --git a/v0.2.0/community/resources.html b/v0.2.0/community/resources.html
index 2564037893..9a1988258c 100644
--- a/v0.2.0/community/resources.html
+++ b/v0.2.0/community/resources.html
@@ -14,7 +14,7 @@
-
+
Community resources - docTR documentation
@@ -389,7 +389,7 @@ Community resources
-
+
diff --git a/v0.2.0/contributing/code_of_conduct.html b/v0.2.0/contributing/code_of_conduct.html
index 5ea4a1f99d..03422dbb4d 100644
--- a/v0.2.0/contributing/code_of_conduct.html
+++ b/v0.2.0/contributing/code_of_conduct.html
@@ -14,7 +14,7 @@
-
+
Contributor Covenant Code of Conduct - docTR documentation
@@ -504,7 +504,7 @@ Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
Attribution
-
+
diff --git a/v0.2.0/contributing/contributing.html b/v0.2.0/contributing/contributing.html
index e5a85682c6..05e2b3641b 100644
--- a/v0.2.0/contributing/contributing.html
+++ b/v0.2.0/contributing/contributing.html
@@ -14,7 +14,7 @@
-
+
Contributing to docTR - docTR documentation
@@ -481,7 +481,7 @@ Let’s connect
-
+
diff --git a/v0.2.0/datasets.html b/v0.2.0/datasets.html
deleted file mode 100644
index 766f224a12..0000000000
--- a/v0.2.0/datasets.html
+++ /dev/null
@@ -1,564 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.datasets - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework
-can be a significant save of time.
-
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
-
--
-class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶
-Implements an abstract dataset
-
-- Parameters:
-
-url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
-
-
-
-Here are all datasets that are available through DocTR:
-
--
-class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
-
-- Example::
>>> from doctr.datasets import FUNSD
->>> train_set = FUNSD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
-
-- Example::
>>> from doctr.datasets import SROIE
->>> train_set = SROIE(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-
--
-class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶
-CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
-
-- Example::
>>> from doctr.datasets import CORD
->>> train_set = CORD(train=True, download=True)
->>> img, target = train_set[0]
-
-
-
-
-
-- Parameters:
-
-train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
-
-..autoclass:: OCRDataset
-
-
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
-
--
-class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶
-Implements a dataset wrapper for fast data loading
-
-- Example::
>>> from doctr.datasets import FUNSD, DataLoader
->>> train_set = CORD(train=True, download=True)
->>> train_loader = DataLoader(train_set, batch_size=32)
->>> train_iter = iter(train_loader)
->>> images, targets = next(train_iter)
-
-
-
-
-
-- Parameters:
-
-dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
-
-
-
-
-
-Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets
-of vocabs.
-
-
-¶
-
-
-
-
-
-
-Name
-size
-characters
-
-
-
-digits
-10
-0123456789
-
-ascii_letters
-52
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
-
-punctuation
-32
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
-currency
-5
-£€¥¢฿
-
-latin
-96
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°
-
-french
-154
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿
-
-
-
-
-
--
-doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶
-Encode character sequences using a given vocab as mapping
-
-- Parameters:
-
-sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-
-- Returns:
-the padded encoded data as a tensor
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/documents.html b/v0.2.0/documents.html
deleted file mode 100644
index a7450d8048..0000000000
--- a/v0.2.0/documents.html
+++ /dev/null
@@ -1,736 +0,0 @@
-
-
-
-
-
-
-
-
-
-
-
-
- doctr.documents - docTR documentation
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Skip to content
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Back to top
-
-
-
-
-doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis
-results to structured formats.
-
-Document structure¶
-Structural organization of the documents.
-
-Word¶
-A Word is an uninterrupted sequence of characters.
-
--
-class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a word element
-
-- Parameters:
-
-value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
-
-
-
-
-
-Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
-
--
-class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a line element as a collection of words
-
-- Parameters:
-
-words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all words in it.
-
-
-
-
-
-
-
-Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
-
--
-class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶
-Implements a non-textual element
-
-- Parameters:
-
-artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size.
-
-
-
-
-
-
-
-Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
-
--
-class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶
-Implements a block element as a collection of lines and artefacts
-
-- Parameters:
-
-lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing
-all lines and artefacts in it.
-
-
-
-
-
-
-
-Page¶
-A Page is a collection of Blocks that were on the same physical page.
-
--
-class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶
-Implements a page element as a collection of blocks
-
-- Parameters:
-
-blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
-
-
-
-
-
-Document¶
-A Document is a collection of Pages.
-
-
-
-
-
-File reading¶
-High-performance file reading and conversion to processable structured data.
-
--
-doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_pdf
->>> doc = read_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶
-Read an image file into numpy format
-
-- Example::
>>> from doctr.documents import read_img
->>> page = read_img("path/to/your/doc.jpg")
-
-
-
-
-
-- Parameters:
-
-file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-
-- Returns:
-the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶
-Read a PDF file and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import read_html
->>> doc = read_html("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – URL of the target web page
-
-- Returns:
-decoded PDF file as a bytes stream
-
-
-
-
-
--
-class doctr.documents.DocumentFile[source]¶
-Read a document from multiple extensions
-
--
-classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶
-Read a PDF file
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
-
-
-
-
-
-- Parameters:
-file – the path to the PDF file or a binary stream
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_url(url: str, **kwargs) PDF [source]¶
-Interpret a web page as a PDF document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> doc = DocumentFile.from_url("https://www.yoursite.com")
-
-
-
-
-
-- Parameters:
-url – the URL of the target web page
-
-- Returns:
-a PDF document
-
-
-
-
-
--
-classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶
-Read an image file (or a collection of image files) and convert it into an image in numpy format
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"])
-
-
-
-
-
-- Parameters:
-files – the path to the image file or a binary stream, or a collection of those
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
-
-
--
-class doctr.documents.PDF(doc: Document)[source]¶
-PDF document template
-
-- Parameters:
-doc – input PDF document
-
-
-
--
-as_images(**kwargs) List[ndarray] [source]¶
-Convert all document pages to images
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns:
-the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
-
--
-get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶
-Get the annotations for all words in the document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words()
-
-
-
-
-
-- Parameters:
-kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns:
-the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
-
--
-get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶
-Get the artefacts for the entire document
-
-- Example::
>>> from doctr.documents import DocumentFile
->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts()
-
-
-
-
-
-- Returns:
-the list of pages artefacts, represented as a list of bounding boxes
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file
diff --git a/v0.2.0/genindex.html b/v0.2.0/genindex.html
index 7be65c62d4..21520455b4 100644
--- a/v0.2.0/genindex.html
+++ b/v0.2.0/genindex.html
@@ -13,7 +13,7 @@
- Index - docTR documentation
+ Index - docTR documentation
@@ -224,15 +224,42 @@
-
-
+
+
diff --git a/v0.2.0/getting_started/installing.html b/v0.2.0/getting_started/installing.html
index a488e9a030..af3b58193e 100644
--- a/v0.2.0/getting_started/installing.html
+++ b/v0.2.0/getting_started/installing.html
@@ -14,7 +14,7 @@
-
+
Installation - docTR documentation
@@ -305,7 +305,7 @@
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@ Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
doctr.datasets¶
-Whether it is for training or for evaluation, having predefined objects to access datasets in your prefered framework -can be a significant save of time.
-Available Datasets¶
-The datasets from DocTR inherit from an abstract class that handles verified downloading from a given URL.
--
-
- -class doctr.datasets.core.VisionDataset(url: str, file_name: str | None = None, file_hash: str | None = None, extract_archive: bool = False, download: bool = False, overwrite: bool = False)[source]¶ -
Implements an abstract dataset
--
-
- Parameters: -
-
-
url – URL of the dataset
-file_name – name of the file once downloaded
-file_hash – expected SHA256 of the file
-extract_archive – whether the downloaded file is an archive to be extracted
-download – whether the dataset should be downloaded if not present on disk
-overwrite – whether the archive should be re-extracted
-
-
Here are all datasets that are available through DocTR:
--
-
- -class doctr.datasets.FUNSD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶ -
FUNSD dataset from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
--
-
- Example::
- --
>>> from doctr.datasets import FUNSD ->>> train_set = FUNSD(train=True, download=True) ->>> img, target = train_set[0] -
-
-
-
- Parameters: -
-
-
train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
- -class doctr.datasets.SROIE(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶ -
SROIE dataset from “ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction”.
--
-
- Example::
- --
>>> from doctr.datasets import SROIE ->>> train_set = SROIE(train=True, download=True) ->>> img, target = train_set[0] -
-
-
-
- Parameters: -
-
-
train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
-
-
- -class doctr.datasets.CORD(train: bool = True, sample_transforms: Callable[[Tensor], Tensor] | None = None, **kwargs: Any)[source]¶ -
CORD dataset from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
--
-
- Example::
- --
>>> from doctr.datasets import CORD ->>> train_set = CORD(train=True, download=True) ->>> img, target = train_set[0] -
-
-
-
- Parameters: -
-
-
train – whether the subset should be the training one
-sample_transforms – composable transformations that will be applied to each image
-**kwargs – keyword arguments from VisionDataset.
-
-
..autoclass:: OCRDataset
-Data Loading¶
-Each dataset has its specific way to load a sample, but handling batch aggregation and the underlying iterator is a task deferred to another object in DocTR.
--
-
- -class doctr.datasets.loader.DataLoader(dataset, shuffle: bool = True, batch_size: int = 1, drop_last: bool = False, workers: int | None = None)[source]¶ -
Implements a dataset wrapper for fast data loading
--
-
- Example::
- --
>>> from doctr.datasets import FUNSD, DataLoader ->>> train_set = CORD(train=True, download=True) ->>> train_loader = DataLoader(train_set, batch_size=32) ->>> train_iter = iter(train_loader) ->>> images, targets = next(train_iter) -
-
-
-
- Parameters: -
-
-
dataset – the dataset
-shuffle – whether the samples should be shuffled before passing it to the iterator
-batch_size – number of elements in each batch
-drop_last – if True, drops the last batch if it isn’t full
-workers – number of workers to use for data loading
-
-
Supported Vocabs¶
-Since textual content has to be encoded properly for models to interpret them efficiently, DocTR supports multiple sets -of vocabs.
-Name |
-size |
-characters |
-
---|---|---|
digits |
-10 |
-0123456789 |
-
ascii_letters |
-52 |
-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ |
-
punctuation |
-32 |
-!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~ |
-
currency |
-5 |
-£€¥¢฿ |
-
latin |
-96 |
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~° |
-
french |
-154 |
-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~°àâéèêëîïôùûçÀÂÉÈËÎÏÔÙÛÇ£€¥¢฿ |
-
-
-
- -doctr.datasets.encode_sequences(sequences: List[str], vocab: str, target_size: int | None = None, eos: int = -1, **kwargs: Any) ndarray [source]¶ -
Encode character sequences using a given vocab as mapping
--
-
- Parameters: -
-
-
sequences – the list of character sequences of size N
-vocab – the ordered vocab to use for encoding
-target_size – maximum length of the encoded data
-eos – encoding of End Of String
-
-- Returns: -
the padded encoded data as a tensor
-
-
doctr.documents¶
-The documents module enables users to easily access content from documents and export analysis -results to structured formats.
-Document structure¶
-Structural organization of the documents.
-Word¶
-A Word is an uninterrupted sequence of characters.
--
-
- -class doctr.documents.Word(value: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶ -
Implements a word element
--
-
- Parameters: -
-
-
value – the text string of the word
-confidence – the confidence associated with the text prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to
-size (the page's)
-
-
Line¶
-A Line is a collection of Words aligned spatially and meant to be read together (on a two-column page, on the same horizontal, we will consider that there are two Lines).
--
-
- -class doctr.documents.Line(words: List[Word], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶ -
Implements a line element as a collection of words
--
-
- Parameters: -
-
-
words – list of word elements
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to -the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing -all words in it.
-
-
Artefact¶
-An Artefact is a non-textual element (e.g. QR code, picture, chart, signature, logo, etc.).
--
-
- -class doctr.documents.Artefact(artefact_type: str, confidence: float, geometry: Tuple[Tuple[float, float], Tuple[float, float]])[source]¶ -
Implements a non-textual element
--
-
- Parameters: -
-
-
artefact_type – the type of artefact
-confidence – the confidence of the type prediction
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to -the page’s size.
-
-
Block¶
-A Block is a collection of Lines (e.g. an address written on several lines) and Artefacts (e.g. a graph with its title underneath).
--
-
- -class doctr.documents.Block(lines: List[Line] = [], artefacts: List[Artefact] = [], geometry: Tuple[Tuple[float, float], Tuple[float, float]] | None = None)[source]¶ -
Implements a block element as a collection of lines and artefacts
--
-
- Parameters: -
-
-
lines – list of line elements
-artefacts – list of artefacts
-geometry – bounding box of the word in format ((xmin, ymin), (xmax, ymax)) where coordinates are relative to -the page’s size. If not specified, it will be resolved by default to the smallest bounding box enclosing -all lines and artefacts in it.
-
-
Page¶
-A Page is a collection of Blocks that were on the same physical page.
--
-
- -class doctr.documents.Page(blocks: List[Block], page_idx: int, dimensions: Tuple[int, int], orientation: Dict[str, Any] | None = None, language: Dict[str, Any] | None = None)[source]¶ -
Implements a page element as a collection of blocks
--
-
- Parameters: -
-
-
blocks – list of block elements
-page_idx – the index of the page in the input raw document
-dimensions – the page size in pixels in format (width, height)
-orientation – a dictionary with the value of the rotation angle in degress and confidence of the prediction
-language – a dictionary with the language value and confidence of the prediction
-
-
Document¶
-A Document is a collection of Pages.
- - -File reading¶
-High-performance file reading and conversion to processable structured data.
--
-
- -doctr.documents.read_pdf(file: str | Path | bytes, **kwargs: Any) Document [source]¶ -
Read a PDF file and convert it into an image in numpy format
--
-
- Example::
- --
>>> from doctr.documents import read_pdf ->>> doc = read_pdf("path/to/your/doc.pdf") -
-
-
-
- Parameters: -
file – the path to the PDF file
-
-- Returns: -
the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
- -doctr.documents.read_img(file: str | Path | bytes, output_size: Tuple[int, int] | None = None, rgb_output: bool = True) ndarray [source]¶ -
Read an image file into numpy format
--
-
- Example::
- --
>>> from doctr.documents import read_img ->>> page = read_img("path/to/your/doc.jpg") -
-
-
-
- Parameters: -
-
-
file – the path to the image file
-output_size – the expected output size of each page in format H x W
-rgb_output – whether the output ndarray channel order should be RGB instead of BGR.
-
-- Returns: -
the page decoded as numpy ndarray of shape H x W x 3
-
-
-
-
- -doctr.documents.read_html(url: str, **kwargs: Any) bytes [source]¶ -
Read a PDF file and convert it into an image in numpy format
--
-
- Example::
- --
>>> from doctr.documents import read_html ->>> doc = read_html("https://www.yoursite.com") -
-
-
-
- Parameters: -
url – URL of the target web page
-
-- Returns: -
decoded PDF file as a bytes stream
-
-
-
-
- -class doctr.documents.DocumentFile[source]¶ -
Read a document from multiple extensions
--
-
- -classmethod from_pdf(file: str | Path | bytes, **kwargs) PDF [source]¶ -
Read a PDF file
--
-
- Example::
- --
>>> from doctr.documents import DocumentFile ->>> doc = DocumentFile.from_pdf("path/to/your/doc.pdf") -
-
-
-
- Parameters: -
file – the path to the PDF file or a binary stream
-
-- Returns: -
a PDF document
-
-
-
-
- -classmethod from_url(url: str, **kwargs) PDF [source]¶ -
Interpret a web page as a PDF document
--
-
- Example::
- --
>>> from doctr.documents import DocumentFile ->>> doc = DocumentFile.from_url("https://www.yoursite.com") -
-
-
-
- Parameters: -
url – the URL of the target web page
-
-- Returns: -
a PDF document
-
-
-
-
- -classmethod from_images(files: Sequence[str | Path | bytes] | str | Path | bytes, **kwargs) List[ndarray] [source]¶ -
Read an image file (or a collection of image files) and convert it into an image in numpy format
--
-
- Example::
- --
>>> from doctr.documents import DocumentFile ->>> pages = DocumentFile.from_images(["path/to/your/page1.png", "path/to/your/page2.png"]) -
-
-
-
- Parameters: -
files – the path to the image file or a binary stream, or a collection of those
-
-- Returns: -
the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
- -class doctr.documents.PDF(doc: Document)[source]¶ -
PDF document template
--
-
- Parameters: -
doc – input PDF document
-
-
-
-
- -as_images(**kwargs) List[ndarray] [source]¶ -
Convert all document pages to images
--
-
- Example::
- --
>>> from doctr.documents import DocumentFile ->>> pages = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images() -
-
-
-
- Parameters: -
kwargs – keyword arguments of convert_page_to_numpy
-
-- Returns: -
the list of pages decoded as numpy ndarray of shape H x W x 3
-
-
-
-
- -get_words(**kwargs) List[List[Tuple[Tuple[float, float, float, float], str]]] [source]¶ -
Get the annotations for all words in the document
--
-
- Example::
- --
>>> from doctr.documents import DocumentFile ->>> words = DocumentFile.from_pdf("path/to/your/doc.pdf").get_words() -
-
-
-
- Parameters: -
kwargs – keyword arguments of fitz.Page.getTextWords
-
-- Returns: -
the list of pages annotations, represented as a list of tuple (bounding box, value)
-
-
-
-
- -get_artefacts() List[List[Tuple[float, float, float, float]]] [source]¶ -
Get the artefacts for the entire document
--
-
- Example::
- --
>>> from doctr.documents import DocumentFile ->>> artefacts = DocumentFile.from_pdf("path/to/your/doc.pdf").get_artefacts() -
-
-
-
- Returns: -
the list of pages artefacts, represented as a list of bounding boxes
-
-
Installation¶
-This library requires Python 3.9 or higher.
+This library requires Python 3.10 or higher.
Prerequisites¶
Whichever OS you are running, you will need to install at least TensorFlow or PyTorch. You can refer to their corresponding installation pages to do so:
@@ -435,7 +435,7 @@Via Git¶
-
+
diff --git a/v0.2.0/index.html b/v0.2.0/index.html
index 19218e24cf..3a06afc6d9 100644
--- a/v0.2.0/index.html
+++ b/v0.2.0/index.html
@@ -12,9 +12,9 @@
gtag('js', new Date());
gtag('config', 'G-40DVRMX8T4');
-
+
-
+
docTR documentation
@@ -226,15 +226,42 @@
-DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+
DocTR provides an easy and powerful way to extract valuable information from your documents:
-🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
-This is the documentation of our repository doctr.
-
-Features¶
+
+Main Features¶
-🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
-
-🧑🔬 Build & train your predictor¶
+
+
+
+Model zoo¶
+
+Text detection models¶
-👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
-
-
-🧰 Implemented models¶
-
-Detection models¶
-
-
-DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
-LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
+DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
+LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
+FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
-
-
-Recognition models¶
-
-
-SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
-CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+
+Text recognition models¶
+
+SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
+
+MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
+ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
+PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
-
-
-🧾 Integrated datasets¶
-
-
+
+Supported datasets¶
+
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
+SROIE from ICDAR 2019.
+IIIT-5k from CVIT.
+Street View Text from “End-to-End Scene Text Recognition”.
+SynthText from Visual Geometry Group.
+SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
+IC03 from ICDAR 2003.
+IC13 from ICDAR 2013.
+IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
+MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
+IIITHWS from “Generating Synthetic Data for Text Recognition”.
+WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
-
-
-
-
-Getting Started¶
-
-- Installation
-
-
-
-
-Contents¶
-
+
+
+
+
+
+
+
+
@@ -364,7 +381,7 @@ Contents
DocTR: Document Text Recognition¶
-State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2
+docTR: Document Text Recognition¶
+State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
+DocTR provides an easy and powerful way to extract valuable information from your documents:
-
-
🧾 for automation: seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
+🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
This is the documentation of our repository doctr.
-Features¶
+Main Features¶
-
-
🤖 Robust 2-stages (detection + recognition) OCR predictors fully trained
+🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
-🚀 State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract
-⚡ Predictors optimized to be very fast on both CPU & GPU
-🐦 Light package, small dependencies
-🛠️ Daily maintained
-🏭 Easily integrable
+🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
+⚡ Optimized for inference speed on both CPU & GPU
+🐦 Light package, minimal dependencies
+🛠️ Actively maintained by Mindee
+🏭 Easy integration (available templates for browser demo & API deployment)
🧑🔬 Build & train your predictor¶
+Model zoo¶
+Text detection models¶
-
-
👷 Compose your own end-to-end OCR predictor: mix and match detection & recognition predictors (all-pretrained)
-👷 Fine-tune or train from scratch any detection or recognition model to specialize on your data
-
🧰 Implemented models¶
-Detection models¶
---
-- -
DB (Differentiable Binarization), “Real-time Scene Text Detection with Differentiable Binarization”.
- +
LinkNet, “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”.
- +
DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
- +
LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
Recognition models¶
---
- -
SAR (Show, Attend and Read), “Show, Attend and Read:A Simple and Strong Baseline for Irregular Text Recognition”.
- +
CRNN (Convolutional Recurrent Neural Network), “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”.
+ Text recognition models¶
++
-- +
SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
- +
- +
MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
- +
ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
🧾 Integrated datasets¶
---+
+ Supported datasets¶
+-
FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
- +
CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
- +
SROIE from ICDAR 2019.
- +
IIIT-5k from CVIT.
- +
Street View Text from “End-to-End Scene Text Recognition”.
- +
SynthText from Visual Geometry Group.
- +
SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
- +
IC03 from ICDAR 2003.
- +
IC13 from ICDAR 2013.
- +
IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
- +
MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
- +
IIITHWS from “Generating Synthetic Data for Text Recognition”.
WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.
Getting Started¶
-
-
- Installation - -